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No professional man, then, thinks of giving according 
to measure. Once engaged, he gives his best, gives his 
personal interest, himself. His heart is in his work, and 
‘for this no equivalent is possible; what is accepted is in 
the nature of a fee, gratuity, or consideration, which en- 
ables him who receives it to maintain a certain expected 
mode of life. The real payment is the work itself, this 
and the chance to join with other members of the profession 
in guiding and enlarging the sphere of its activities. - 
GEORGE HERBERT PALMER in The Ideal Teacher. 
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A. INTRODUCTION 

The beginnings of the examination idea are lost in the 
hoary past. History does not provide us with the exact 
date when this concept originated, although students of 
education do know that even the most primitive peoples 
required their young men to undergo various examinations to 
determine physical fitness, proficiency in the art of war- 
fare, and ability to procure food. How different the 
examinations of today are from these early prototypes: 
These primitive peoples with their tribal organization 
reserved the examinations of the eligible young men to a 
time when elaborate ceremonies could be held. At the 
Glimax of the festivities the candidates publicly sub- 
mitted to their examinations. Passing the examination 
entitled the young warriors to full membership in the 
tribal organization. ‘these ceremonial periods were eager- 
ly anticipated by all because of the feasting and 
Merry-making that accompanied them. 

The gradual changes in the examination concept will 
be treated more fully later in this report. Let it 
suffice at this point to indicate that an evolutionary 
process went on over a period of thousands of years. 


Until a quarter of a century ago educators attempted to 
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measure entirely by two means, viz.e, informal oral testing 
and the written traditional examination. Ina series of 
scientific studies, it was proved that the traditional or 
essay-type examination was not satisfactory as a measuring 
instrument because of its unreliability due to limited 
Sampling and subjectivity in scoring. Just prior to this 
time, certain great educational authorities were experi- 
Menting with rating scales and tests in the fields of 
spelling, arithmetic, handwriting, and composition with 
the intent of removing the subjective factor in teacher 
marking. Coincident with this epochal development was the 
work of the French genius, Alfred Binet, in the field of 
psychological testing. These wonderful advances caused the 
attention of progressive educators throughout the world 

to be focused on tests and measurements. 

At the present time it is safe to say that there are 
hundreds of published tests purporting to be scientific 
measuring instruments. Not only that, but each year new 
books and treatises are published to swell the ever- 
increasing mass of knowledge about testing methods and 
techniques. The research student is amazed at the great 
mass Of material that exists in this field. Notwithstand- 
ing, it is sate to say that thousands of high school teach- 
ers in the United States are ignorant of the principles 


and procedures underlying the well-rounded testing program 
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that is so necessary for effective teaching. All too 
often the classroom teacher has been content to administer 
published tests just to show his supervisors that he is 
progressive. In many cases the main values of the test 
would be lost because no provision was made for diagnosis 
and remedial teaching. What should impress the educational 
expert in many high schools is that there is no definite, 
co-ordinated testing program. In commenting about this 
Situation, Blackstone mades the following observation: 
"Each teacher gave examinations and short tests according 
to any haphazard sort of plan that suited his fancy, and 
there was rarely any attempt to perfect these devices, to 
make sure that they measured fairly and accurately, or to 
extend their scope outside the narrow realm of measuring 
factual knowledge and an understanding of some 
In order to reap the full benefits of twentieth century 
education the individual teacher should work out a sound 
testing program and follow it, rather than give tests 
Whenever it suits his convenience. 

An equally potent reason why teachers are turning 
more and more to the new-type examination is because of 
the necessity born of depression conditions. "Large 
Organizations of taxpayers everywhere are insistent that 


the costs of government must be reduced. Since a large 


(1) Blackstone, Earl G., Com. Hdu. in the High School, 
General Editor, Harry Kitson, Ginn & Co., 1929. 
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part of local taxes goes to the support of education, 
the public schools of America are facing a critical 
period of economic adjustment. Reductions in school 
budgets are being met by larger classes and increased 
teaching ee ie movement has not been confined 
to merely a few high schools; in all parts of the 
country, the same trend is evident. Teachers and 
educators may bitterly attack this tendency but it is 
continuing never-the-less. It is a physical impossi- 
bility for the ordinary teacher to quiz orally all his 
pupils daily if he has large classes. This dilemma 
would be unsolvable if it were not for new-type tests 
with their correcting and scoring methods. 

Supervisors and administrators are also utilizing 
the new-type examination to an increasing degree. 
Both supervision and testing look to the same goal, the 
improvement of teaching efficiency. Supervisors no 
longer have the temerity to step into a classroom and 
attempt to judge the teacher's ability according to the 
Old impressionistic method in ten or fifteen minutes. 
The supervisor is aided tremendously by the results of 


new=-type tests administered to the classes in the 


(2) Carlson, Paul A., The Measurement of Business 
Education, pp. 21, South-Western Publishing Co., 1932. 
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"Supervision based upon test results tends to be positive 
and constructive. It forms a scientific approach for 
helpful conferences, suggestions, and experimentation. 
Without the use of test results, supervision is a hit and 
miss procedure and is sometimes valueless, if not even 
a a ins the superintendent, standardized 
tests have meant nothing less than the ultimate changing 
of school administration from guess work to scientific 
Tae conclusion, then we may say that the new- 
type examination with its accompanying correcting and 
scoring procedures has been a material factor in 
alleviating the work of all parties concerned, principals, 


supervisors, heads of department and the classroom 


teacher. 


(3) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 291, Houghton-Mifflin Co., 1950. 


(4) Cubberly, Ellwood P., Public Education in the United 
States, Houghton-Mifflin Co., 1954. 
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B, HISTORICAL SUMMARY OF THE TESTING MOVEMENT 

The beginnings of examinations go back before the 
dawn of recorded history. It is true, however, that the 
early examination was very different from the modern 
product; a process of evolution has gone on culminating 
in the standardized objective and intelligence tests of 
today. Early history supplies us with a number of in- 
stances of examinations. In the Old Testament we read 
how the Gileadites tested the Ephraimites upon their 
ability to pronounce the word "Shibboleth." The unlucky 
Ephraimite who pronounced it "Sibboleth" failed in his 
examination and was speedily put to death. 

This tragic test is not, however, to be considered 
the first instance of examination. "As early as 2200 B.C., 
China had an elaborate national system of examinations, 
for the purpose of selecting public ete cabbd aah al 
and examinations of various kinds were in use hundreds 
and even thousands of years ago ary Arian people as the 
Chinese, the Greeks, and the Romans.” It is safe to say 
that tests of mental and physical traits that were in- 
volved in the initiation ceremonies of primitive people 
antedated even these first early attempts at examination. 


Even primitive peoples were not slow to recognize the 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 1, Houghton-Mifflin Co., 1930. 


(2) Odell, C. We, Educational Measurement in High School, 
pp. 55, The Century Co., 1930. 
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necessity for the measurement-of achievement of pupils as 
an essential element of education. According to Alberty 
and Thayer: "It may be said that even under primitive. 
conditions, the measurement of achievement of pupils is 
an essential element of education. Before the youth may 
share fully in the life of the male members of the tribe, 
he has to receive certain elementary training at the 
hands of the women, the end of which is marked by rites 
and ceremonies which indicate the fitness of the individ- 
ual to participate fully in adult Ver Gc ena 

In ancient times the Egyptians had taken tremendous 
strides in perfecting tests for physical relationships. 
Russell telis of their accomplishments thus: “Early 
peoples, such as these Egyptians, developed certain 
measures to a high degree and were able to achieve 
astonishing results, both mechanically and scientifically, 
in the fields so developed. In other fields they were 
restricted. In surveying, in some of the mechanic arts, 
and in such astronomical observation as could be accomp- 
lished without the use of the telescope, these people were 
highly ae evidence does not prove, however, 


that the Egyptians made any proportionate development in 


achievement or any other kind of indirect testing. 


(3) Alberty, H. R. and Thayer, V. T., Supervision in the 
Secondary School, pp. 328, D. C. Heath & Co., 1931. 


(4) Russell, Charles, Standard Tests, pp. 4, Ginn & Co., 1930. 
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In the early Grecian period, the Athenians and 
Spartans had attained considerable fame for their systems 
of living and education. The Spartan system intended 
primarily to ineuleate its youth with the ideals of 
physical perfection and martial courage, whereas the 
Athenian system aimed at a combination of both physical 
development and cultural attainment. The Athenian system 
recognized the individual; the Spartan, considered the 
individual only as an integral part of the state. Russell 
says, "In these two systems of living and of education are 
found the two contrasting elements of primitive education 
and cultural education. The Spartan education had been 
formalized and systematized until it had lost every semb- 
lance of individualism; the Athenian education had been 
freed until it became in the end almost completely individ- 
Dee a, Spartans tested ability to endure pain 
by conducting regular examinations, in the form of whippings, 
before the altar of Artemis ae a in his 
famous method of questioning submitted his pupils to 
searching questioning which really was a form of incessant 
examination. The examination concept had its genesis in 
antiquity and has slowly developed through the ages. 

The medieval period contains two influences that 


affected the growth of examinations, viz., chivalry and the 


(5) Ibid, pp. 16. 


(6) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 2, Houghton-Mifflin Co., 1930. 
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medieval universities. Chivalry, or the medieval system 
of knighthood, had as its ideal of education the prepara- 
tion of the young page to take his place in the ranks of 
knighthood. His education consisted of two phases, which 
were: first, training in the art of warfare given him by 
the men-at-arms and knights, and secondly, training in the 
etiquette and ideals of knighthood, the phase of education 
which was entrusted to the ladies of the castle. Both 
elements of education were considered necessary before the 
candidate for knighthood could qualify as a full-fledged 
knight. The raising of the esquire to the estate of 
knighthood was embodied in appropriate ceremonies that 
really were the approximate of an examination. Russell 
makes the following comment: “Education in chivalry 
considered from the point of view of one destined to be- 
come a knight partook somewhat of the Spartan form of 
education, since the page was drilled in all types of 
exercises to fit him for iipneaa ons 

The medieval universities had a significant influence 
in determining the trend of examinations. The medieval 
university was simply a guild or association of persons 
who were interested in teaching. They were divided 
according to three levels of attainment, which were: 


apprentice, journeyman, and master. The completion of the 


(7) Russell, Charles, Standard Tests, pp. 17, Ginn & Co., 1930. 
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lower level entitled the candidate to a baccalaureate 
degrees. Upon completion of the journeyman level, the 
candidate was entitled to the master's or doctor's degree 
if he succeeded in disputing and defending a thesis. "There 
are records of such examinations at the University of 
Bologna as early as A.D. 1419, and at the University of 
Paris by the end of the thirteenth RE ne broaden- 
ing of the examination concept due to the influence of 
medieval universities is of great import because this 
institution was the crucibdle that produced the written 
form of examination. According to Lang: “Probably the 
first written examination at a university was in 1702, when 
it was introduced at Cambridge, England.......... There 
seems to be no doubt but that the universities of the 
middle ages gave the examination system to our western 
ee ae 

The Boston examination of 1845 is an important land- 
mark in the development of tests and measurements in the 
United States. Students of the history of education 
attach great weight to this early examination for a number 
of reasons; primarily, because it was the first comprehen- 
Sive written examination to be administered in any school 
System of this country. The details of how the examination 


Came into being are interesting. The school committee was 


(8) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 3, Houghton-Mifflin Co., 1930. 


(8). Ibid, pp. 3. 
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empowered to make an inspection of the schools each year. 
Part of this inspection consisted of an oral examination 
of the pupils. Year by year as the enrolment of the 
schools grew it became increasingly difficult for the 
school committee members to get around to all the schools 
and adequately test the pupils on the subject matter they 
had studied. Realizing that they faced a formidable, if 
not impossible, job, the school committee decided to test 
merely the highest class in each school. After a while, 
even this task began to be performed in a perfunctory way. 
In 1845, the school committee decided to delegate to 
a sub-committee the duty of visiting schools and testing 
the pupils on their intellectual attainments. This sub- 
committee attacked its task with great thoroughness and 
seriousness of purpose. It decided to give a written 
examination, so a series of questions were drawn up in the 
various subjects including astronomy, definitions, geo- 
graphy, grammar, history and natural philosophy. These 
questions were drafted very carefully and a conscious 
effort was made to arrange the questions in the various 
subjects in increasing order of difficulty. The committee 
attempted this procedure in order to include a few ques- 
tions that probably even the poorest pupils could answer, 


and a few that would probably be beyond the mental powers 
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of the best pupils. These questions were then printed, 
to be handed out to the pupils on the day of the examin- 
ation. 

The insight that these committee members had into the 
subject of testing is amazing. The committee planned the 
details for administering the examination with a great 
deal of care. ‘Yhey were careful that no copies of the 
examination got out. By eight o'clock in the morning they 
appeared unannounced at different schools; each one of the 
three examiners taking a different school. Boston had 
nineteen grammar schools at that time, and these tests 
were to be given only to the highest classes which included 
five hundred and thirty pupils. The committee member 
examining a class would first see that all books and 
reference materials were put away; next the pupils were 
seated far enough apart to prevent communication; then they 
were warned that only one hour was allowed for the examina- 
tion and they were not to spend time on handsome writing; 
after this, the printed question sheets were handed out and 
the pupils started. Promptly at the end of the hour the 
examination papers were collected and the examiner hurried 
On to the next school on his schedule. In that way four 
schools were finished by each examiner in the morning, and 


three more in the afternoon. ‘the next day, the committee 
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members took another subject and went through the same 
procedure. They continued this, until they had given 
examinations on all of the subjects previously listed. 

The answers to the questions were scored carefully, 
and the results tabulated and analyzed. In order to 
score the papers uniformly as possible, a set of rules 
were prepared covering doubtful points. These rules 
Were used as guides rather than being rigidly adhered to. 
The committee was very conservative in its comments upon 
the examination; nevertheless, the results were quite 
startling. “The inefficiency revealed by the survey was 
as great a surprise and disappointment to the school 
committee as many of our modern survey reports have 
proved to be to those who made phan mee completeness 
and thoroughness of this Boston examination places it 
among the most remarkable incidents in the history of 
education in the United stavease 

In 1837 Horace Mann had been appointed Secretary of 
the State Board of Education in Massachusetts. His 
appointment was made in the face of determined opposition 
from the group of thirty schoolmen who served as masters 
of the Boston Public Schools. These schoolmen were 
afraid that the new secretary would infringe upon their 


prerogatives. Mann would be classified today as a pro- 


(10) Caldwell, Otis W., and Courtis, Stuart A., Then and 
Now in Education 1845: 1923, pp. 7, World Book Co., 


(11) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 7, Houghton-Mifflin Co., 1930. 
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gressive educator. Naturally his ideas would clash with 
those of this conservative band of schoolmasters. "If 
Horace liann did not precipitate the controversy with the 
Boston schoolmasters, at least he welcomed their opposition 
as an opportunity to direct the attention of the wealthiest 
and most ambitious city in the state to conditions much 

in need of Oop! 1 in Boston examinations of 1845 made 

@ profound impression on Mann because he recognized the 
significance of the scientific method applied to education 
and hailed the report as the dawn of a new era, 

Among his other duties, Horace Mann edited the Common 
School Journal. In this periodical, he published copious 
extracts from the Report, and discussed it and the subject 
of examinations at length. In his comments, Mann points 
out why he thinks the written examination is far superior 
to its oral predecessor. His comments reflect the 
brilliance of his great educational statesmanship that 
later was to be such a cogent factor in the development 
and trend of the American philosophy of education. 

Mann's arguments, briefly summarized, were as follows: 

"1. It is impartial 
2 It is fairer to the scholars 
5- It is more thorough 


4. It prevents officious interference of the teacher 


(12) Caldwell, Otis W., and Courtis, Stuart A., Then and 
Now in Education 1845 : 1923, pp. 7, World Book Co., 1924. 
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5. It determines teaching efficiency 

6. It prevents favoritism 

7. It makes the results available to all 

8. It reveals the ease or difficulty of 

(13) 
the questions.” 

Mann concludes that the superiority of the written exam- 
ination over the oral method was so clearly demonstrated 
that no school committee would ever again venture to 
return to the latter practice. 

Some years ago it was decided to repeat the Boston 
Tests in order to secure data that would allow for 
comparison between the relative advancement of pupils then 
and now. The 1845 test material was analyzed carefully 
from the point of view of present-day conditions. It 
was found that many questions could not be given to the 
pupils of today because of the shifting emphasis upon the 
aims and objectives of education. In general, most of 
these questions to be deleted merely involved pure factual 
knowledge. Selection was made of thirty questions, five 
in each of six subjects, which seemed to have possibilities 
for twentieth-century children. The modified test was 
given to a large group of grammar school children in 


various school systems throughout this country. 


(13) Caldwell, Otis W., and Courtis, Stuart A., Then and 
Now in Education 1845:1923, pp. 7, World Book Coil, 1924. 
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"Phe outstanding conclusions from the Boston ‘ests 
of 1845 are these: 

1. Present-day children tend to make lower scores 
on the pure memory and abstract skill questions 
and higher scores on the thought or meaningful 
questions; 

2. the changes which have taken place are general 
throughout the country; and 

3. the efficiency of present instruction, even at 
its best, although higher than in 1845, is still 
far from es ay in 

The Development of Intelligence Tests 

It is safe to say that intelligence tests as we know 
them today have developed within the last twenty-five 
years. Intelligence tests, one of the most valuable 
tools of the progressive educator today, are a gift of the 
psychologists. "They emerged from experimental studies 
of individual differences. In England, Galton was studying 
individual differences by 2h ae to Hildreth 
the concept of mental tests was introduced by the latter 
writer: "Modern education and the science of child study 


are greatly indebted to Galton. He undertook studies of 


individual differences in imagery and sensory capacity, 
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(14) Caldwell, Otis W., and Courtis, Stuart A., Then and 
Now in Education 1845:1923, pp. 85, World Book Co., 1924. 


(15) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 9, Houghton-Mifflin Co., 1930. 
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collected materials on the problem of mental heredity, and 
laid the foundation of biometrics. He introduced the con- 
cept of mental tests and developed laws describing the 
distribution of mental pacha a! Sides studies were continued 
in America by Cattell and Thorndike about a half-century 
later. 

It is with the name of Alfred Binet that the beginning 
of the intelligence testing movement is usually linked. 
Symonds says, "It is perhaps unfair to ascribe the beginning 
of the movement to Binet, but what was done before his 
time seems insignificant beside his contribution. Binet, a 
Frenchman (1857-1911), the son of a physician, was a genius 
whose interests were both theoretical and sitlonaniae > bn 
about the same time (1894-95) that Dr. Rice was experi- 
menting in this country with his two spelling tests, Binet 
was working in France on his mental tests. The Binet 
scale was finally produced in 1905. It is an individual- 
type intelligence test; its administration is so complex 
that it should be given by a trained psychologist. 

The Binet test was produced originally for use in 
France. Later it was revised so that it could be used in 
this country. Symonds briefly describes the tests thus: 
"These tests were a set of tasks to be performed under 


controlied conditions and the responses were more or less 


(16) Hildreth, Gertrude H., Psychological Service for 
School Problems, pp. 8, World Book Co., 1930. 


(17) Symonds, Percival M., Measurement in Secondary 
Education, pp. 53, the MacMillan Co., 1930. 
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defined. In 1908 the first scale appeared with tests 
grouped in age levels and the mental diagnosis was given 

in terms of mental age. A revision, essentiaily the series 
of tests used today, appeared in i weeks scale that 
appeared in 1908, is known as the Binet-Simon scale for 

the measurement of intelligence of school children. It is 
called this because it represented the combined efforts of 
Binet and his co-worker, a Frenchman named Simon. 

The first English translation of the Binet-Simon scale 
was made by Dr. H. H. Goddard of the Vineland, New Jersey, 
Training School for feeble-minded. Goddard was stimulated 
to experiment with the new tests because of the urgency of 
dealing with the feeble-minded. Symonds contributes the 
following: “Intelligence testing is the resultant of at 
least five converging movements, some practical, some 
theoretical. Perhaps foremost of the movements was the 
very practical one of dealing with the EN od 
It is interesting to note that the first English transla- 
tion was used principally in feeble-minded institutions, 
prisons, reform schools, and juvenile courts. ‘The tests 
were not satisfactory for use, however, with American 
school children. "Goddard was the first psychologist to 
make widespread Fong the tests with school children in 


the United States." 


(19) Ibid, pp. 53. 


(20) Hildreth, Gertrude H., Psychological Service for 
School Problems, pp. 32, World Book Co., 19350. 
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"In 1913, Dr. Lewis M. Terman, of Stanford University, 
began a revision and extension of the Binet scale. The 
Stanford Revision was published by Terman in 1916, and has 
been widely used." In fact, Terman's chief influence on 
general education has been through his construction and 
application of revisions of the Binet scale and the 
interpretation of the results in the interests of general 
welfare. 

So far we have been dealing with individual intelii- 
gence tests. An individual intelligence test is one that 
can be administered to merely one person at a time; it is 
contrasted with group inteiligence tests which are capable 
of testing simultaneously a large number of people. The 
name of Dr. A. S. Otis is particularly noteworthy in this 
connection. Symonds says, "To Otis properly belongs the 
credit for compiling and publishing in 1918 the first 

group test of intelligence as a measuring i ne 

Ruch and Stoddard give the following summary: "Just 

previous to the entry of the United States into the 

World War, Arthur S. Otis had been working at Stanford 

University under the direction of Terman on a test of 
intelligence which could be administered to large groups 
of people at the same time. With the entry of the 


United States into the war, Otis's materials were placed 


(21) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 10, Houghton-Mifflin Co., 1930. 


(22) Symonds, Percival M., Measurement in Secondary 
Education, pp. 56, the MacMillan Co., 1950. 
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at the disposal of the committee appointed to formulate 
Mental tests suitable for the examination of soldiers. 
The Army Alpha tests were in considerable measure the 
result of adaptation of the Otis axteniaiéoes ad the group 
intelligence test was born. 
Rise of Standardized Tests 

The first attempt at standardized objective tests is 
associated with the name of The Reverend George Fisher, an 
English schoolmaster, in 1864. He prepared a Scale Book 
which attempted to differentiate between different levels 
of work in composition, drawing, French, grammar, history, 
knowledge, mathematics, navigation, practical science, 
Scripture, spelling, and writing. These scales enabled 
the examiner to assign numerical values to the various 
Subjects, the highest being 1 and the lowest 5, with 
intermediate fourths between each value. We can see that 
this attempt was years ahead of its time. Lang says, 
"This attempt was too far ahead of the times to have any 
immediate Se, - e the Boston Examination it was 
so far ahead of its time that educators did not grasp its 
vital significance. 

The first attempt at standardized tests in this 


country came exactly thirty years after The Reverend 


Fisher's monumental work. American educators are indebted 
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(23) Ruch, G M., and Stoddard, George D., Tests and 
Measurements in High School Instruction, pp. 3, 
World Book Co., 1927. 


(24) Lang Albert R., Modern Methods in Written Examina- 
tions, pp. 12, Houghton-Mifflin Co., 1950. 
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this time to a great educational pioneer, Dr. J. M. Rice, 
who devised a standardized test in spelling in 1894. Ruch 
and Stoddard say, “Rice is probably entitled to the credit 
of having produced the first educational test, for as 

early as 1894-95 he constructed two spelling '"tests"', 

one in list form and one in sentence sei Later he made 
similar tests in arithmetic and Pee dei eee e ss spell- 
ing test consisted of a list of fifty words, and he went 
around to the schools of the various towns administering 
it. As a result of his testing, he concluded that pupils 
who studied spelling fifteen minutes a day for eight years 
did as well as pupils who spent thirty minutes daily for a 
like number of years. The reader will note that Dr. Rice's 
work Was going on at the same time that Alfred Binet was at 
work on his mental tests. 

It is fortunate that about the time that Dr. Rice was 
carrying on his testing program, E. L. Thorndike was a 
student at Columbia University. When Thorndike heard about 
Rice's work he was intensely interested, notwithstanding 
the fact that other educators had repudiated both Rice's 
results and his testing methods. About this time, too, 
the educational world was awakening to the need of modern 
testing procedures. Russell has the following comment: 
(25) Ruch, G. M., and Stoddard, George D., Tests and 


Measurements in High School Instruction, pp. 2, 
World Book Co., 1927. 
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"Dr. Rice supplied a new technique of measurement, but 
improved very little on the measures and measuring in- 
struments then in use. It remained for Dr. E. L. 
Thorndike, then at Teachers College, Columbia University, 
to establish some reliable units of measurement, which 

he did in 1904, with the publication of his Introduction 
to the Study of Mental and Social EE Lae 

Mann and Fisher, Rice's work was too far ahead of the 
times. Work with the standardized test was really brought 
to fruition upon Thorndike's entry into the field. 

During all this time the American public was grad- 
ually becoming school conscious. "In our own country 
education for the masses had seen a development never 
before approached. By 1900 it was expected that every 
individual should have a common-school education, and by 
1925 a high-school education was coming to be looked upon 
as the right of every boy and eee is natural that 
this trend would influence the development of tests and 
measurements by accelerating the development of better 
testing instruments. The increased demands on the schools 
made it imperative that testing methods be completely re- 
vamped. Brueckner and Melby call attention to a breakdown 


in the traditional school machinery when it undertook to 


(26) Russell, Charles, Standard Tests, pp. 34, Ginn & Co., 1930. 


(27) Brueckner, Leo J., and Melby, Ernest 0., Diagnostic 
and Remedial Teaching, pp. 18, Houghton-Mifflin Co., 1931. 
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educate all the children. “Evidences of this breakdown 

are found in the studies of Ayres, Maxwell, and ceenivensy 
At any rate, the need of diagnostic testing and remedial 
teaching was brought into bold relief. This need has per- 
sisted and still persists, and at present, conditions are 
aggravated by the prevailing economic depression that is 
forcing back into the schools low-ability pupils who 
ordinarily would be out working. 

Between the years 1903-15, Dr. Thorndike and his 
students constructed a number of tests and scales which 
were validated and standardized. Some of the notable ones 
were a handwriting scale (Thorndike-1909), several arith- 
metic tests (Stone-1908 and Courtis-1909), a scale in 
English composition (Hillegas-1912), a spelling scale 
(Buckingham-1913), and later two reading tests (The 
Thorndike Visual Vocabulary Scales 1914-1916, and The 
Thorndike Scale Alpha Two for Measuring the Understanding 
of Sentences 1915-1916). "With the beginning of the 
Stone Arithmetic Tests in 1908, and of the Thorndike Hand- 
Writing Scale in 1909, began what aap termed scientific 
measurement in the field of education." The establishment 
of standards of accomplishment in the field of testing is 


attributed to Dr. S. A. Courtis of Detroit. Modern teachers 


= ee we ee oe oe oe ee ee ewe ee ee ee ee oe ce ee ee ee ee ee eee ee ei ee ie ee 


(28) Ibid, pp. 18. 


(29) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 13, Houghton-Mifflin Co., 1930. 
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are particularly interested in the norms and standards 
published with the good standard examination because they 
are enabled to compare the status of their group with the 
standard. Russell says, "The establishment of standards 
of accomplishment in the same field, indicating degrees 
of educational advancement in that field on the part of 
school children, using the same measure for all the 
pupils, and thereby making a standard of accomplishment 
for the various grades, was a big Se 
Odell lists three factors that are of prime import- 
ance in the development of the modern test movement, viz.: 
"1. The considerably increased interest in school 
marks during 1910 and the few years immediately 
following. 

2- Another directly influential movement was the 
development of school surveysS.....-+.. The first 
survey to employ such tests was that of New York 
City in 1911-12. 

5. Several important periodicals began to devote 
considerable attention to test development. 

The Teachers College Record, the Journal of 

Educational Psychology, Educational Administra- 

tion and Supervision, and School ogg = nae are 
” 


noteworthy in this respectecececeres 


(30) Russell, Charles, Standard Tests, pp. 38, Ginn & Co., 1930. 


(31) Odell, C. W., Educational Measurement in High School, 
pp. 55 and 36, The Century Co., 1950. 
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The reader can see, all of these influences had their effect 
upon the new testing movement. 

The school survey movement came into being in 1910, and 
has contributed much toward the supervision and improvement 
of instruction. The first school surveys gained great 
publicity bdecause of the multitude of crudities and defects 
in the educational structure that they exposed. Existing 
instructional methods were challenged; the complacency of 
the "stand-pat" teachers was jarred. Now it is evident 
that the school survey has come to stay. Progressive 
teachers do not fear a school survey but rather, welcome 
it, because it enables their classes to show off to 
advantage. "As the school survey movement developed it 
soon changed in character from an occasional survey made 
by outside experts, to a continuous survey of production 
made from within by the superintendent of schools and 
his atatcle The latest ramification of the school survey 
idea is the creation of city bureaus of educational re- 
search to conduct testing programs, gather data, and 
interpret results. 

School superintendents have been greatly aided in 
their work by the data from these surveys. Standardized 
tests have changed school administration from guess work 


(32) Cubberly, Ellwood P., Public Hducation in the United 
States, Houghton-Mifflin Co., 1934. 
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to scientific accuracy. School administrators and teachers 
need more knowledge about the defects and abnormalities of 
their school venture; greater knowledge leads to deeper 
insight into school problems. Only in this way can the 
curriculum and courses of study be defended when the 
searchlight of investigation is turned upon them. The goal 


for all should be a more complete testing program. 


+ 


C. THE SOCIAL-BUSINESS STUDIES OF THE SECONDARY SCHOOL 

It has long been felt by commercial educators that 
the commercial curriculum should provide something more 
than purely vocational skill training. During the last 
thirty years it has been increasingly manifest that new 
Subject material should be infused into the commercial 
course in order to enrich and supplement this training in 
skills. The social-business studies gradually were 
developed in order to supply the background of economic 
and legal principles and knowledge so essential in the 
building up of a social outlook and a social philosphy. 
Dr. Tonne says: "The social-business subjects must be 
justified in the course not because of their doubtful 
alliance with the vocational business subjects, but rather 
because of the contribution they are in a position to 
make to a more efficient economic education for the second- 
ary-school eG 2 | 

Now the question: "What subjects are included in the 
social-business group?" There is no exact agreement among 
authorities as to what is included. In dealing with this 
group of subjects it must be remembered that there is a 
paucity of reference material. As Dr. Tonne expresses it: 
"It can readily be seen that the available printed object- 


ive material in the social-business subjects is very meager 


(1) Tonne, Herbert A., and Tonne, M. Henriette, Social 
Business Education in the Secondary Schools, pp. 28, 
New York University Book Store, 1932. 
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indeed. In all these subjects there is much opportunity 
for good onlin adiee ds good books have come out 
recently, however, that deal with these subjects. In 

the most recent one, "Commercial Hducation in the High 
School", Professor Frederick G. Nichols includes the 
following subjects in this category: industrial and 
commercial geography, commercial law, economics, business 
organization, advertising and salesmanship, history of 
commerce, and junior business training. Dr. Tonne, a 
leading authority previously quoted, would include all of 
the above subjects plus business English, and, possibly, 
short courses in marketing and banking. The only real 
difference so far is in regard to business English. 
Whether or not to include this course in the social-bus- 
iness studies will depend on the aims and objectives and 
teaching methods and materials employed by the teacher 
giving it. Then, there is a question, too, avout the 
inclusion of the first year of bookkeeping. When this 
Subject was taught some years ago preparation for voca- 
tional efficiency was stressed; the skill training element 
was predominant. Today, it is different. The emphasis is 
upon the teaching of the principles of business through 


bookkeeping with the skill training as a secondary object- 


(2) Ibid, pp. 224. 
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that both business English and elementary bookkeeping 
should be included in the fold of the social-business 
subjects. 

It is necessary at this stage to consider some of the 
history of early curriculum proposals in order to arrive 
at the present status of the social-business group. One 
of the earliest proposals of importance was by "The 
Committee of ied 1903. This committee proposed that 
commercial geography be scheduled in the tenth year, 
commercial law and political economy each a half-year in 
the eleventh year, and history of commerce a half year in 
the twelfth year. As regards the direct effect of this 
report Professor Nichols says, “Nineteen years after this 
report appeared (1922) but little progress had been made in 
the direction of getting this recommendation for social- 
business subjects adopted as will be seen from the following 
han: 

Table 1 
SUBJECTS STUDENTS 

BookkeepingescsrccccscccccccccseerlO,5l7 

SHOP CHANG s seccccccsvccccsccsccccs elIl, 904 

BVPGUPITIBG cc ccs ec ccesesesecee oH0l, 5a 

COMMGPOTEL TAR. e ieee a ceccesese 19,611 


Commercial Geography......ceesee 56,616 
Commercial History...cccccecsseee 8,307 


(3) Commercial Edueation in High Schools (1903), Univer- 
sity of the State of New York, College Department, 
Bulletin 235, pp. 5-7. Cited by Nichols, pp. 426. 


(4) United States Bureau of Education Bulletin No. 35, 
Statistics of Public High Schools (1929), pp. 102. 
Cited by Nichols, pp. 426. 
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It will be noted that political economy was not mentioned, 

although there were a hundred thousand students in 

economics which was listed as an academic subject. No 

evidence appears as to the progress of banking, finance, 

and advertising, recommended by the Committee as electives. 
A study by C. H. Marvin in 1922 indicates clearly 

that the social-business Subjects were being badly neglected. 


(5) 
He found the situation to be as follows: 


Table 2 

NUMBER OF SCHOOLS 

SUBJECT REPORTING COURSE 
BOOKKGSP ING. ccccccccccccccccccege 109 
Commercial Arithmetic.......ccoee 84 
Business English. ceccccccccccccee 94 
SEO EA 89 
PYPOWFITING. cc cc ccccccccccccccsece 88 
Commercial Geographyecccccccccecce 65 
Commercial History. crccccccccvece 14 
Commercial Law..cecccccccccccccccs 18 
UPR e 7 ck paciecedeneeeeses 13 
Salesmanshipecccccrccccceccvcccccs 12 
AGVErCisingecccccccccccovccrvccce 12 


A study made by Leverett S. Lyon in ee shows 
6 
somewhat better results for these subjects. But this 


study included only cities having a population of 


100,000 or more. The results of this study may be 


(5) C. H. Marvin, Commercial Education in Secondary 
Schools, pp. 40, Henry Holt and Co., 1922. Cited 
by Nichols, pp. 427. 


(6) L. S. Lyon, A Survey of Commercial Education in the 
Public High Schools of the United States, the 
University of Chicago Press, 1919. 
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(7) 
tabulated as follows: 


Table 3 

NO. OF NO. OF NO. NO. No. 

SUBJECT SCHOOLS SCHOOLS REQUIR- INCLUDING TEACHING IN 
INCLUDED REPORTING ING AS ELECTIVES COM. DEPT. 

Ind. History... 224 136 31 12 t2 
Hist. of Commerce 224 156 25 6 1S 
ECONOMICS..ceccee 224 136 49 Kor) 24 
Com. Geography... 224 136 94 25 76 
Com. Law. @eeoeee2ese8 224 136 90 28 84 
Bus. English..... 224 136 64 9 32 
Salesmanship..... 224 156 15 25 31 
Advertisinge..ccee 224 156 8 16 18 
Com. Organization 224 136 oa 7 8 


A study of this table indicates no widespread teaching of 
any of these subjects, except commercial geography and 
commercial law, as social-business subjects in a program 
of commercial education. Mr. Lyons concludes that: 
"Social-business subjects, directed and taught as they are, 
sometimes by persons of purely classical training, cannot 
be relied upon to present any definite body of knowledge or 
consistent point of view. The evidence would seem to show 
that no definite point of view has been determined and that 
the results which are obtained from these courses must be 
varied in the ce 

These various studies have caused more attention to 
be focused on the social-business subjects and the trend 
in secondary schools seems to be to give them more prom- 


inence in the curriculum. In general there seems to have 


(7) Table formed by combining several by Lyon in Education 
for Business, pp. 369. Cited by Nichols, pp. 427. 


(8) L. S. Lyon, Education for Business, pp. 382, the 
University of Chicago Press, 1951. Quoted in Nichols, 


pp. 428, 
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been a steady growth in the pupil enrolment of all of 
these subjects with the exception of history of commerce. 
"The teaching of history of commerce has proved most 
unsatisfactory. Available instruction material was, and 
still is, scanty and faulty. Separating commercial and 
industrial incidents of history from their natural sett- 
ing tends to produce a distorted and one-sided view of 
oo hahed gh hil it is doubtful if history of commerce 
Willi function as an independent subject in the social- 
business group until a reorganization and a re-evaluation 
of subject-matter takes place. 

“Since 1919 considerable progress has been made 
toward appropriate emphasis on the social-business subjects 


(10) 
as the following statistics show: 


Table 4 
Pupils--1915 Pupils--1928 
SUBJECT NUMBER PER CENT NUMBER PER CENT 
Commercial ee es 19,611 0.91 76,434 2.264 
Commercial Geography..... 56,616 1.70 140,246 4.84 
History of Commerce...e.- 8,307 0.59 §, 321 0.18 
PR OBBINL OE 6.0 ¢.0:0:0-nei9-0.0-010.0:0.0 1005540 4.80 147,035 5.08 


An analysis of this table reveals that substantial gains 
have been made in enrolments in all courses except history 
of commerce. The report from which the above data were 


taken does not list salesmanship or business organization, 


(9) Frederick G. Nichols, Commercial Education in the 
High School, pp. 428, D. Appleton-Century Co., 1933. 


(10) United States Bureau of Education Bulletin No. 35, 
Statistics of Public High Schools, 1929, pp. 102. 
Quoted by Nichols, pp. 4351. 
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but these subjects are now taught in many schools. New 
developments are going on in the offering of the various 
schools throughout the United States and, at present, it 
seems that the tendency is to regard the social-business 
Subjects as the core of the commercial curriculum. 

There is one further aspect of the topic to be 
considered now that the increased emphasis upon the social- 
business subjects has been brought out. This has to do 
With the relative proportions of voys and girls enrolled in 
the social-business courses. It has been established after 
scientific research that girls tend to enrol in the social- 
business studies more than boys. This holds true for all the 
Subjects except Economics when the proportions are about 
equal. “This may be accounted for by the fact that 
economics is probably considered an academic subject rather 
than a business subject in most high jitette ee 

Professor Tonne presents the results of his study in 
the following fae” 


(See next page for table.) 


(11) Tonne, Herbert A., and Tonne, M. Henriette, Social 
Business Education in the Secondary Schools, pp. 77, 
New York University Book Store, 1932. 


(13) bid, pp. 79. 
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Table 5 - PROPORTION OF BOYS AND GIRLS ENROLLED IN THE 
SOCIAL-BUSINESS SUBJECTS 


Number and Percentage 
Subject Boys Per Cent Girls Per Cent Total 


BoonomicS.cecccese 9,255 48.5 9,852 51.5 19,107 
Business Law....- 12,031 42.5 16,305 57 .5 28,536 
Economic Geo. .-. 15,418 3822 24,934 61.8 40,352 
Business English. 12,676 56-6 21,990 65-4 54,666 
Bus. Organization 1,852 52.8 1,658 47.2 3,510 
Jr. Bus. Training 1,337 41.1 1,920 58.9 53,257 
Salesmanship..ecc- 7,746 48.7 8,147 51.3 15,893 
Advertising....e. 3,255 49.3 3,347 50.7 6,602 
His. of Commerce. 1,613 43.7 2,080 56.3 3369S 
Bankingeccecccece 149 45.2 181 54.8 53350 
Read thus: In the sudject Economics data were 
secured for 9,255 boys in a study of 410 high schools. 
This is 48.5 per cent of the total number of students 
studied. Data for 9,852 girls were secured. This was 
51.5 per cent of the total number of 19,107 students. 
Professor Tonne accounts for the above situation in 
this way: “The fact that the proportion of boys to girls 
is not actually greater in all social-business subjects 
may be attributed among other reasons to: 1, the close 
traditional association of the social-business subjects 
with bookkeeping, stenography, and typewriting which 
appeal primarily to girls; 2, the subjects are not taught 
in such a manner that they will appeal to the interests of 
boys; and 3, improper guidance from the “ge ae other 
13 
students, or from the high-school faculties." 


(13) Ibid, pp. 79. 
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It has been shown already that at the present time 
the social-business subjects are coming into prominence 
more and moree That this tendency is desirable is now 
generally conceded among commercial educators. ‘The old 
basis of business education, and the present business- 
college system, is fundamentally the developing of skills. 
Professor Tonne cites H. G Shields in this connection: 
"The inadequacy of our present secondary-school business 
curriculum in providing the vocational-school student with 
general business training is at present one of the most 
important issues among certain thinkers in business 
education. H. G Shields, School of Commerce, University 
of Chicago, deplores the fact that business education as 
it is today is really clerical education. Shields is of 
the opinion that the student in the present curricula in 
business education is being trained merely in technical 
Skills and is in no way being oriented in the realities of 
business caus This situation can be obviated by the 
insertion of balanced social-business materials properly 
co-ordinated and integrated so that the lost values in 
the business curriculum may be recaptured. 

It is generally admitted that post-depression con- 
ditions are making it necessary for all business men to 
show their mettle. Conditions are tense; bankruptcy or 
(14) H. G Shields, "Our Clerical Mills", the Journal of 


Business Education, Vol. 4, May 1930, pp. 34. Cited 
by Tonne, pp. 52. 
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loss of position faces the unlucky individual that does not 
Measure up to the set standards. A few decades ago high 
school graduates were content to enter, small business and 
"learn the business". Today the lure of romance and adven- 
ture centers upon the colossus of business - the vast 
corporation with its many subsidiaries. Secondary school 
graduates, many times, feel this gravitating force and as 

@ result obtain work in some nationally known company. Now, 
the point is this: unless these young graduates study the 
principles and business knowledge underlying the work of 
these giant combines, they will be entirely unoriented and 

| failure, with its accompanying sense of inferiority, will 
follow. This situation will be circumvented only by a care- 
ful inclusion of social-business material in the commercial 
pupil's course during high school days. 

It is a mooted question whether a commercial educator 
has the right, in his vocational guidance work, to advise 
-pupils to enter the field of business if he feels that the 
pupil will not advance above the clerical level. This 
statement is particularly pertinent in reference to the boy 
pupils. It must be borne in mind that the commercial course 
provides the entree into business, nothing more. Once the 
young graduate gets his foothold, then he must climb accord- 


ing to his own ability. The problem of getting stranded on 
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a@ lower level of ladder is real and ever-present. Professor 


Nichols says, “As a partial offset against the tendency to 
become stranded on the clerical level, every high school 
commercial pupil should learn something of the fundamental 
principles of business management, become somewhat familiar 
with the functions of major departments of a business 
organization, acquire some small degree of understanding of 
legal principles applicable to business transactions, 
secure some rudimentary comprehension of what are regarded 
as important economic principles, and give a little atten- 
tion to the study of basic industries as possible fields 
with which to become identified later. The social-business 
subjects seem to be the best media through which to achieve 
these EEE 

The query is often raised: “Why do not the social- 
business subjects yield greater educational values in the 
Way of an improved social outlook and social philosophy?” 
The nub of the problem concerns the organization of the 
social-business courses. An impartial examiner examining 
representative courses of study in these subjects would be 
forced to conclude that the subject-matter is not well 
organized. Professor Tonne explains this faulty organiza- 
tion thus: "The usual obstacles, such as a general lack of 


acceptance of sufficiently uniform objectives, the diver- 


(15) Frederick G. Nichols, Commercial Education in the 
High School, pp. 436, D. Appleton-Century Co., 1933. 
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gence of ideas as to the content of the social-science 
curriculum, the overlapping of subject matter, unsuitable 
textbooks, and poorly trained teachers, are factors which 
are all present and ea 

Before the social-business group yields its full values 
two conditions will have to be alleviated. The first has 
to do with the grade placement of materials. Professor 
Tonne says, “The social-business subjects should be spread 
out to a much greater extent among the four years of high 
school. In this way they will not compete with each other 
and with the other subjects of the curriculum which struggle 
for the twelfth year, which for many reasons seems to be 
considered the choice eg second condition harks 
back to the methodology employed in presenting these sub- 
jects. It has to do with the lack of correlation and 
integration between the different subjects. Professor 
Tonne points out this weakness thus: "This situation 
shows that the social-business subjects function as inde- 
pendent units, for the most part entirely apart from each 
other-and from other subjects in the ae eee 
conclusion, then, the future should bring a complete re- 
evaluation and re-organization of materials in the social- 
business studies if they are to yield the rich dividends so 


necessary in the fuller education for pupils enrolled in the 


commercial curriculum. 


(16) Tonne, Herbert A., and Tonne, M. Henriette, Social 
Business Education in the Secondary Schools, pp. 63, 
New York University Book Store, 1932. 


(1%) -Ibia, pp. 81. 
(18) ibid, Ppe 88. 
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D. THE CRITERIA OF A GOOD HXAMINATION 
Personal judgment determines to a great extent 

Whether an examination is good or bad. Before any judg- 
ment can be made the type of examination and the function 
or functions it is to perform must be taken into 
consideration. It is obvious that an examination can 
be rated just the same as a story or a picture. In each 
Case no rating can be attempted until the judges decide 
upon the properties selected as standards by which to 
judge. It is essential that these standards be reason- 
able criteria. “Hardness” might be all right as a quality 
by which to judge rocks but it is too unmeaningful as a 
standard for judging examinations. "The qualities which 
test experts have agreed upon as composing the most service- 
able and straightforward standards for evaluating examina- 
tions siete 

1. Validity 

2e Reliability 

5. Objectivity 

4. Comprehensiveness 

5. Facility 

6. Utility 

7. Rapport 
It is Professor Lang's contention tnat these seven qualities 
include all the desirable characteristics of good examina- 


tions. The degree to which they are present determines 


the worth of the examination. 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 47, Houghton-Mifflin Co., 1930. 
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= Having decided upon standards for evaluating an 


examination, it will be necessary to consider each one 
in detail. By far the most important one is validity. 
Ruch says, "The most important single fact which can be 
known about a test or examination is the degree of 


(2) | | 
validity which it possesses.” From the teacher's stand- | 


point a test is not valid unless it covers all the im- 
portant items taught. The pupil does not consider a test 
good that does not stress the important parts of the 


unit or the course that he has considered. Both of these 


requirements are elements in the validity of the test. 

Sometimes the terms "Goodness" or "Worthwhileness” are 
used as synonyms for validity. It is of prime importance 
that the testmaker incorporate in the test or examination 
those elements or items that are essential and take pains 
to eliminate the nonessentials. 

Professor Ruch gives a number of ideas that taken 
collectively represent the concept of validity. These 
oe 

"1. Validity is the degree to which a test or 
examination measures what it is intended to 


measuree 


2. Validity is the general worthwhileéeness of an 
examination. 


3 Validity refers to the care taken to incorpor- 
ate in a test or examination those elements or 


(2) Ruch, G. M., The Objective or New-Type Examination, 
pp. 27, Scott, Foresman and Co., 1929. 


(a), 36a, pp. 28. 
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items which are of prime importance, and to 
the pains taken to eliminate the non-essential. 


4. Validity is in general the degree to which a 
test parallels the curriculum and good teaching 
practice. 

5. Validity refers to the value of the test for 
measuring specific abilities in an accurate 
fashion, and a test ceases, to have validity 
when applied to the measurement of abilities 
for which it was not intended.” 

All too often the classroom teacher is prone to accept 
that a test measures reading ability if it is labeled a 
reading test. In certain cases it has been proved that 
tests were wrongly named, yet the classroom teacher does 
not delve into the information about the building of the 
test in order to find out what the test-maker wishes 
measured or the functions which he desires performed. 

In the listing of Professor Ruch's ideas of validity 
above, the third one is worthy of amplification. Care 
Should be taken to include in the examination those 
instructional materials that are really worth while. The 
most significant items can be determined from the course 
of study, the basic textbooks and what authorities con- 
sider the minimum essentials. In addition to this, 
irrelevant factors such as penmanship, spelling, English, 


neatness, arrangement, speed of writing, and the like, 


should be eliminated or minimized unless the examination 


is intended to measure one of these. A valid commercial 
geography test is one that measures achievement in commercial 
geography and nothing else. If factors other than commercial 
geography enter into the measurement, the test becomes 
invalid as a commercial geography test. 
Professors Ruch and Stoddard give a fine summary of 
the most frequently used validation methods. These include 
the hetanwtacs 
"1. Textbook analysis 
2. Analysis of courses of study 
Se Analysis of final examination questions 
4. Pooled judgments of competent persons 
5. Use of rating scales in setting up criteria 


6- Correlations with school marks or other measures 
of school success. 


7- Increase in percentage of successes with 
successive ages or grades 


8. Correlations with previously validated measures 


9. Differential scores shown by two groups known to 
be widely separated upon a scale of ability 


10. Determination of social utility 
ll. Logical or psychological analysis 


12. Correlations with tests of other intellectual, 
non-intellectual or educational abilities." 


This summary includes the methods used for validating both 


psychological and educational tests. As the scope of this 


(4) Ruch, G. M., and Stoddard, George Des Tests and 
Measurements in High School Instruction, pp. 304, 
World Book Co., 1927. 


~ 


45-6 


thesis embraces merely the field of educational tests and 
measurements some separation will have to be made. Methods 
5, 8, 9, 11, and 12 are used for the validation of either 
psychological tests or trade tests, consequently will not 

be treated at this time. . The other methods are all of 

use in reference to testing in the social-business subjects. 
Needless to say, some of them are of much greater importance 
than others. Inspection of the list shows that the first 
four reduce to the single criterion of expert opinion. 
Methods 6, 7, and 10 are experimental in character but 
these methods tend toward greater refinement than is 
possible or necessary in constructing informal objective 
tests for classroom use. Sometimes a2 combination of 
methods is used in order to validate the test. 

At this point, each method will be taken up and 
explained in detail. The first is the "textbook analysis" 
method. Professor Carlson used this method very success- 
fully in his Series D, Bookkeeping Tests to accompany 
"Twentieth Century SMO Consider that an book- 
keeping examination covering the first year's work to be 
given through-out the state is being prepared. Suppose 
that a number of standard bookkeeping textbooks are used 
among the various high schools. It is quite possible, then, 


that the material taught by one teacher using a certain 


(5) Carlson, Paul A., The Measurement of Business 
Education, pp. 9, Monograph No. 18, South-western 
Publishing Co., 1932. 
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textbook might be somewhat different than that taught by 


another teacher using a different textbook. Under ihe 
circumstances, the work of the committee for the pre- 
paration of the sdavtnat vox will be rendered difficult. 

In the first step each textbook will have to be analyzed 
carefully to secure the important points it contains in 
Statement form. Next, the results from the various text- 
books will have to be consolidated after an adjustment has 
been made to guard against repetition and overlapping. 
After this, these items will be turned into test questions 
using whatever test technique (true-false, completion, 
multiple-choice, matching, etc.) is best suited to the 
individual item. Now, some provision must be made to make 
Sure that no individual phase of the subject matter is 
overemphasized. In most tests this would be taken care of 
by the "Table of Specifications" which will be described 
later. In a bookkeeping test, a check of the test material 
can be made against the accounting cycle since each textbook 
is supposed to seven the cycle. The test-makers should : 
‘also make sure in this checking process that ali steps in 
the cycle have been covered with completeness. If this 
whole procedure is used, the bookkeeping contest examina- 
tion should parallel all one-year bookkeeping courses in 


. the state. Professor Carlson then goes on to describe how 
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he combines this method with the “pooled judgments” 
method before finally validating his tests. 

Before we attempt to evaluate this method, it must be 
remembered that it is a common one used. The main 
advantages that can be claimed for it are that it is simple 
of use, and tests thus constructed do tend to fit rather 
closely the actual teaching practice of the day. It is 
important to note that the textbook analysis method reduces 
fundamentally to the method of pooled judgments of competent 
persons, since each textbook's content represents the 
judgment of one or more Supposedly competent persons. 

Professors Ruch and Stoddard call attention to the 
following disadvantage: "A test which represents nothing 
more than a composite picture of the content of ten, 
fifteen, twenty or more representative textbooks ina 
given school subject cannot rise above the level of 
measuring what is actually taught in the rank and file of 
schools. Such a test fails to a degree, because it does 
not measure what ought to be Seer eia se this is true, the 
textbook analysis method would be fair for the majority of 
teachers who use the standard pedagogical methods but 
would be unfair for the small minority of "progressive" 


teachers who use the textbook as a guide rather than a 


(6) Ruch, G. M., and Stoddard, George D., Tests and 
Measurements in High School Instruction, pp. 505, 
World Book Co., 1927. 
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bible. In Conclusion, these same authors say, "This method 

is not to be advocated where better methods can be irda a” 
The next method to be considered is "validation by 

analysis of courses of study". As this is one of the minor 

methods, it will be treated very briefly. It is a variate 

of the textbook analysis method. It is not considered so 

good as the latter, however, because courses of study all 


too often include a multiplicity of detail about aims and 


objectives and merely scant outlines of the teaching 


material. Professors Ruch and Stoddard conclude that: 
"On the whole, the analysis of courses of study is inferior 
to the textbook analysis, due to the fact that courses of 
study in their usable (published) form are far less detailed 
than Dea 

The “analysis of examination questions” method, the 
next method to be explained is considered a variate of the 
two preceding Brtsead It has about the same limitations. 
This method has possibilities if it is used in conjunction 
with other methods. In this method, the test-maker gets 
in touch with leading teachers in a certain subject and 
asks them to submit copies of the final examinations they 
have used over a period of years. The test-maker then 


analyzes these and eliminates all repeat items. The 
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(7) Ibid, pp. 307. 
(8) Ibid, pp. 307. 
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remaining items are then summarized under appropriate 
headings covering the individual phases of the subject. 
After this, the work of converting the individual items into 
test questions is carried out. In concluding about this 
method, Ruch and Stoddard say, "The teacher's selections of 
questions for her final examination represent an additional, 
thoughtful culling out of the non-essential..........This 
selection tends toward added refinement over the textbook 
Or course-of-study analysis method. It shades over at the 
Same time to the method of judgments of competent pation s. 
The method par Seeatiunes today is the "pooled judg- 
ments of educational authorities", the next method to be 
taken upe All the methods taken up so far have really 
rested in their final analysis upon pooled judgments. We 
have noted previously how Professor Carlson made use of 
the textbook analysis method in validating his Series D, 
Bookkeeping Tests. Later, after the bookkeeping examination 
was extensively used, he obtained ratings on the examination 
from hundreds of bookkeeping teachers, and thousands of 
pupils who were examinees. These ratings were evaluated in 
Order to ascertain whether a unanimity of opinion had been 
expressed that the bookkeeping examination did cover 


thoroughly the contents of a one-year course in bookkeeping. 


(9) Ibid, pp. 308. 


This unanimity of opinion was found present in the ratings 
given Professor Carlson's tests, consequently they are said 
to possess high validity. 

The "pooled judgments" method is of great importance 
in connection with the sifting of the original lot of 
tentative test items with a view to the elimination of the 
least valid materials. “Experience has shown that the 
average or median judgment of a group of from three to 
ten careful judges is certain to be superior to the opinion 
of a single worker in approximating the true worth and 
difficulties of proposed test Kunde: The valid examina- 
tion should parallel the flow of actual teaching and should 
represent an extensive sampling of the materials of 
instruction. The pooled judgment of competent persons is 
@ good index as to whether or not this standard has been 
attained. In conclusion, Ruch and Stoddard say, "The 
method of pooled judgments, alone or in combination with 
Other methods, is by far the most common validation practice 
in educational test construction today. Unfortunately, 
the judgments very often represent the opinions of but 
two or more persons, and are not checked up against exper- 
imental eC Lt 


The use of the correlation method makes it possible 


(10) Ibid, pp. 310. 
(11) Ibid, pp. 312. 
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to determine the degree of validity of a test in numerical 
terms. This brings us to the next method which is “correla- 
tions with school marks". It is very common practice to 
use school marks in the validation of test materials since 
these marks are easily obtainable. This method is of no 
little value to test-makers. The mathematics of statisti- 
Cal correlation will be reserved for a later topic in this 
thesis and will there be explained in detail. In general, 
the usual procedure is to correlate test scores and marks 
by the Pearson product-moment formula and then make 
additional statistical corrections in order to produce a 
greater degree of refinement. Ruch and Stoddard say, 

"Such correlations are never high, 0.85 being about the 
highest which the writers have ever seen reported in the 
literature for single classes. ‘The reason for such low 
correlations with school marks is to be found in the low 
reliability of the marks." 

"Validation through percentage increases in successes" 
is the next method to be examined. This method is of great 
importance in reference to the drafting of standard tests 
for grammar school use. The first step consists of drawing 
up an experimental edition of the test and administering 


it to hundreds of pupils in different school grades. The 


(ie). Ibid, pp. 318. 
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pupils are then grouped according to either age or grade 
level, and the percentage of pupils passing each item is 
then computed. If the percentage of successes rises 
Sharply and uniformly from one grade to the next, the 
item is judged to be valid and reliable because it dis- 
criminates between different levels of ability. If the 
rise of the percentage of successes is erratic however, 
the item is invalid and should be eliminated. "In 
Summary, items with '"throwbacks"' are a source of great 
unreliability. Items passed by O per cent or 100 per cent 
are functionless but do not cause unreliability. They are 
to be looked upon as '"dead timber"™'. The Sharper the 
rise, the greater the reliability of the vente 

Even a casual examination of this method will show 
that it is unsuited to most high school subjects, particular- 
ly those in the commercial curriculum. Ruch and Stoddard 
point out that: “It is limited in its utility only in two 
directions; viz., (a) in a few physiological capacities 
Which do not continue to develope over a period of years, 
and (b) in those school subjects which are discontinuous 
Over a@ series of ss cnasl tains social-business subjects 
would be subject to the second limitation as they are 


mainly grouped in grades 11 and 12 of the high school. 
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(13) Ibid, pp. 322. 
(14) Ibid, pp. 319. 
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The "validation by the principle of social utility" 
method is limited in usefulness. In short, this method 
presupposes a checking of the test in the light of the 
social significance of the information called for in it. 

In other words, the test should include to a great extent 
Only knowledge and information that will actually function 
in the lives of the individual pupils. In the construction 
of a curriculum, the determination of the social useful- 
ness of educational content is a weighty problem. In fact, 
at the present time the content of many commercial courses 
is being attacked on the ground of social utility. The 
Opponents of these courses claim that the instructional 
content is ill-adjusted to meet the needs of present-day 
youth. It is patent; then, that this problem of social 
utility for tests must ve considered. 

It is not possible to validate all subjects by this 
method. "In a few subjects, chiefly elementary, important 
experimentation has been done in this direction. Extensive 
counts have been made by Thorndike and Horn on vocabular- 
ies; by Horn and Ashbaugh on spellings used in business; 
and by Wilson and others on the arithmetic of oe | 
This method has not been of great importance up to the 
present in the construction of tests and examinations for 


the social-business studies. 


(15). Ibid, pp. 525. 
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In order to complete our list of validation methods 
in the field of educational tests and examinations, it is 
necessary to include some explanation of the method for the 
Validation of individual test items. Professors Ruch and 
Stoddard did not include this method in their list, but it 
has been used extensively in recent years mainly in com- 
bination with other methods. “The assumption is that if 
each item of the test is valid, the entire test - the Sum 
of valid parts - possesses a high degree of re 
This technique was used as one of the steps in validating the 
Peters, Greiner and Green Commercial Law Peck In this 
process, the test was administered in the Commercial Law 
Glasses of a number of high schools and several hundred 
papers were obtained. After being corrected carefully, 
these papers were arranged in order from the highest to 
the lowest scores. Ten per cent of the best papers were 
separated from the pile and, then, ten per cent of the 
poorest papers. Every answer of the ten per cent best 
papers was compared with the same answer in the ten per 
cent poorest papers. If the pupils in the best papers 
did not clearly demonstrate their superiority in know- 
ledge with a given item, then the item was weeded out. 
It is obvious that if the "poorer" pupils answered an 
(16) Carlson, Paul A., The Measurement of Business 

Education, pp. 10, South-Western Publishing Co., 1932. 
(17) Published by South-Western Publishing Co. 
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item correctly more times-or the same number of times as 

the "better" pupils, something was inherently wrong with 

the item. "All of the items retained discriminated between 

good and poor pupils and were therefore considered valid 
(1e) 

items." 

Professor Ruch calls this method "an experimental 
method of validating individual test items". He labels it 
such in order to bring out that this method makes possible 
@ greater degree of refinement than any of the methods 
involving "pooled judgments". He next gives a series of 
seven steps to be followed, viz.: 


"1. Make up the test items, arranging them by in- 
Spection in order of difficulty. 


2e Give the test to the class, allowing time for 
all to attempt every item. 


Se Score the papers. 
4. Arrange the papers in order of size of scores. 


5. Find the median mark and separate the papers 
into two classes: (a) those above the median, 
and (b) those below the median. Call the first 
group the '"good"' pupils and the second group 
the 'poor"' pupils. 


6. ‘Yabulate the number of pupils passing (or fail- 
ing) each individual test item, keeping separate 
tabulations for the '"good"' and '"poor"' pupils. 
Express the passes (or failures) in per cents. 


7. Study the per cents for '"good"' and '"poor"' 
groups. Reject items where the '"poor"' group 
shows percentages of successes as high as or 
higher than the '"good"' group. Such items do 
not differentiate abilities. The best items will 
show the largest differences in successes in 
favor of the '"good"' group."(19) 


ee ee er we em we ee ae 8 ee ee ee ee ee ee ee ee ee ee ee Se Ge ee me me Se me we ee ee 


Carlson, Paul A., The Measurement of Business 
Education, pp. 10, South-Western Publishing Co., 1932. 
Ruch, G. M., The Objective or New-Type Examination, 
‘pp. 57, Seott, Foresman and Co., 1929. 


Ruch illustrates the method by means of an example. 
The illustration he uses is worthy of careful study. 


Table 6 - PER CENTS OF "GOOD" AND "POOR" PUPILS ANSWERING 
INDIVIDUAL ITEMS OF A TEST 


Per Cent of Correct Answers 


Item "Good" Group "Poor" Group Both Groups 


l. 14 14 14 
Be 21 * 14 
on 0 6 ) 
4. 84 16 50 
5e 53 49 51 
6-6 LOO 98 99 
7. 0 0 0 
8. 100 100 100 
9. ) 8 4 
10. 50 50 50 


An analysis of this table reveals the following 
facts: Item 1 does not properly differentiate between 
high abilities and low abilities and consequently might 
be replaced to advantage by an item showing greater 
differentiation. 

Item 2 is greatly superior to item 1. The retention of 
items like number 2 will result in greater validity in 

the test than item l. 

Item 3 should be discarded. 

Item 4 is a good one. There is sharp discrimination be- 
tween good and poor pupils. 

Item 5 does not hurt the test but it is distinctly inferior 
to item 4. If possible it should be replaced with an item 
like number 4. 


Item 6 is very easy for both groups of pupils. A few items 
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like this should be retained and put at the beginning of 

the test in order to create a sense of "rapport" with the 
pupils. 

Item 7 should be thrown away except for a few to be inserted 
in the test to prevent perfect scores. 

Item 8 is similar to item 6. 

Item 9 should be eliminated. 

‘Item 10 probably should be eliminated because it does not 
differentiate between abilities of the higher and lower 
groups. 

Certain of the validation methods that have been 
described in detail so far are applicable only to the 
construction of standard tests. Professor Ruch has a 
number of suggestions in regard to validating tests having 
in mind particularly teacher-built objective tests. He 
proposes the following: 

"1. In the course of regular teaching, make a 
ractice of jotting down good test items 
questions) as they occur to you. 

&e Place these test items on small bits of paper; 
5x5 library cards are best. Make a file of 
these questions. Secure a filing case and keep 
THESE CALAScecccccccccccses 

Se When the time comes to build an examination, 
draw up a Table of Specifications. This will 
tend to guarantee a defensible balance of 


emphasis, freedom from non-essentials, and the 
inclusion of all _important topics. 
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4. After the test is_-given, ask the pupils to 
suggest items that were ambiguous, misleading 
or not understood. These shonld be either 
revised or discarded. 

5. Where possible, try to have one or two other 
teachers criticize your test items and rate 
them for difficulty. 

6. The validity of a test is raised by having 
the items of a proper degree of difficulty. 
Items passed by every child or failed by all 
contribute nothing to the test. cccccceereee 

7. The validity of a test is increased by having 
the easiest items first and the hardest ones 
last." (20) 

These suggestions can be carried out by the class= 
room teacher with a minimum of trouble. If this advice is 
adhered to, the teacher will be rewarded by obtaining a 
much higher degree of validity in his tests. 

Reliability 

The second criteria of a good examination is the 

reliability it possesses as determined by statistical 

(1) 
computation. “Reliability is synonymous with accuracy." 
A standard objective test does not measure up to the 
demands made upon it unless it is what it purports to be, 
viz., a scientific measure of pupil achievement in the 
individual subject or subjects for which it was intended. 
Just as a construction engineer would soon discard a faulty 


surveying instrument because of its inaccuracy, so too the 


i i De A a re De ee ee ee ee 


Ca0)...3uid, pp. 31. 


(1) Odell, C. W., Educational Measurement in High School, 
pp. 58, The Century Co., 1930. 
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modern teacher must be ready to disregard the published 
test that does not possess a satisfactory degree of 
reliability. If two thermometers were placed outdoors 
and were exposed to practically identical conditions of 
Sunlight and wind velocity, it would be normal to expect 
them to register the same temperature. If a reading of 
one showed 70 degrees F. and the other 80 degrees F., the 
examiner would be in a quandary as to which to believe. 
Obviously, one or both would be in error, and a further 
checkup would have to be made to determine which was 
correct. In a like manner, two forms of a standardized 
test which cover the same ability should yield approxi- 
mately the same distribution of scores if administered to 
the same group on successive days under identical testing 
conditions. 

"A test should measure accurately and consistently 
Whatever it attempts to measure. The degree to which it 
does this is called its Ptachee.gem es insure the 
7 proper degree of reliavility, the test must measure accur- 
ately and consistently what it does measure; to insure 
validity, it must measure what it is intended to measure. 


Ruch says, “Reliability is second only to validity as a 


(2) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 51, Houghton-iMifflin Co., 1930. 
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say that the second most important fact which we can know 
(3) 
about a test is the reliability which it possesses." 


Reliability is a much more restricted term than validity. 
According to Ruch, reliability is one aspect of il tees 
Professor Odell expresses the same idea thus: "If a test 
is not reliable, it cannot be valid, since if a test does 
not measure whatever it measures accurately, it cannot 
measure the thing it is supposed to measure day brooks 
is possible for a test to be highly reliable, however, and 
yet have little validity. For example, suppose that a test 
intended to measure general knowledge of commercial 
geography required the reading of long or difficult exer- 
cises or questions. The test might be highly reliable, but 
might measure reading ability rather than geographical 
knowledge. 

The reliability of a test cannot be determined by an 
examination of the test itself, although various inferences 
about it can be drawn. It is the test-maker's duty to 
perfect his instrument so that the desired degree of 
reliability is obtained. The test-user will usually find 
information about the test reliability in the manual or 


elsewhere, yet some testsare published without this informa- 


tion. If this information is not forthcoming, the 


(3) Ruch, G. M., The Objective or New-Type Examination, 
pp. 40, Scott, Foresman and Co., 1929. 


(4) Ibid, pp. 41. 


(5) Odell, C. W., Educational Measurement in High School, 
pp. 59, The Century Co., 1950. 
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assumption is thet the test is new and its reliability has 
Gas been computed, or that the test was not constructed by 
an expert. The absence of this necessary information 
tinges the examination with doubt as to its efficacy for 
testing purposes. Odell says, "It has sometimes been sug- 
gested that .90 be accepted as a standard for the coefficient 
of reliability which all tests should attain. No such 
exact critical point can be justified, but it is probable 
that within a few years a majority of the tests receiving 
wide use will have reliability at least this Bi We a is 
doubtful if any blanket statement can be made about the 
requisite degree of reliability for tests because this 
varies with the function that the test is to perform. 
"Common sense will tell something about the probable 
reliability of a test." For example, suppose some broad 
achievement like knowledge of the use of notes and drafts 
in banking is being tested in commercial law. It is 
obvious that a five-minutes test consisting of merely two 
questions would be a very inadequate sampling of the pupils’ 
knowledge of this subject. The results would be unreliable 
and the entire procedure manifestiy unfair if used as the 
sole basis for determining the pupil achievement on this 


unit of work. Such broad achievement cannot be measured 
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(6) Ibid, pp. 66. 


(7) Ruch, G. M., and Stoddard, George D., Tests and 
Measurements in High School Instruction, pp. 54, 
World Book Co., 1927. 
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accurately in five or ten minutes. Whether or not the 
reliability of a test is to be considered satisfactory 
depends largely on the purpose for which the results are 
to be employed. "Many tests that do not yield individual 
scores reliable enough to be trusted, do give fairly 
accurate scores for groups of pupils of ordinary class 
Size or larger. In other words they are reliable enough 
for use in judging the work of a class or of its teacher, 


(8) 
but not for that of individual pupils.” We can see then 


that a survey test for measuring an entire school or school 


system can yield reliable measures with a small number of 
items (10 to 50, perhaps) since it is dealing with the 
average scores of a large number of pupils. Yet, if any 
individual pupil's score were taken at its face value it 
might be entirely unreliable. High reliability is more 
important for diagnostic use than for general survey 
purposes. 

Principal Methods for Determining Vest Reliability 

Reliability is more of a statistical concept than 

validity. "Reliability is most often stated in terms of 
reliability coefficients; i.e., correlations between the 
scores earned by a group of pupils on two equivalent forms 


(9) 
of a test." It devolves upon the test expert to make the 


(8) Odell, C. W., Educational Measurement in High School 
ppe 60, The Century Co., 1930. 


(9) Ruch, G. M., and Stoddard, George D., ests and 
Measurements in High School Instruction, pp. 52, 
World Book Co., 1927. 
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necessary mathematical computations as the classroom 
teacher ordinarily would not be conversant with the methods 
used. In general, the three most common measures of 
correlation used, according to Odell, are the product- 
moment method and two varieties of rank correlation. It 
mystifies the ordinary teacher when the coefficient of 
correlation obtained by the rank method does not jide with 
that resulting from the Pearson's product-moment formula. 
Statistical experts have figured out methods, though, for 
cgonverting the coefficients of correlation obtained by the 
different methods to an identical basis. Odell says, 
"Although in a general sense any measure of correlation may 
properly by called a coefficient, the expression "coefficient 
of correlation", abbreviated (r), is conventionally limited 
to the product-moment formula. It ranges in value from 
71.00 down through zero to -1.00.” 

Ruch suggests three common methods of finding reliability 
coefficients, viz.: 

"1. By correlation of the scores from duplicate or 
equivalent examinations administered to the same 
pupils. This is ordinarily the most accurate 
and defensible method. 

2 By splitting the results from a single examina- 
tion into chance halves, correlating the half- 
scores, and '"stepping up"' the resulting 
coefficient of correlation by means of the 


Spearman-Brown propheey formula. 


(10) Odell, C. W., Educational Measurement in High School 
pp. 580, The Century Co., 1930. 


5. By repeating the same test or examination after 
an interval and correlating the results. This 
is often called the '"retesting coefficient of 
reliability"'. This method should never be 
employed when the first or second methods are 
possible." (11) 

Theoretically, the first method is the best one to use 
according to Professor Ae sae method cannot always 
be used however because many tests do not have duplicate 
or equivalent forms. In many cases involving tests in the 
social-business studies, recourse must be had to the 
second method. In using this procedure with single form 
tests all the odd numbered items are considered one test 
and all the even numbered items as a second test. The 
coefficient of correlation between the scores of many 
pupils on the odd numbered items and of the same pupils 
on the even numbered items is then computed. The result- 
ing coefficient should then be corrected by the use of the 
Spearman Prophecy Formula. Carlson reports that: "This 
second method (Chance-Halves or Odds-Evens Method) was the 
method used in determining the coefficient of reliability 
of each of the Carlson Bookkeeping Tests and each of the 
Peters, Greiner, and Green Commercial Law eres all 
these computations it is important that the scores from 


several hundred pupils be used for each calculation. 


As has saad been pointed out, the third method is 


(11) Ruch, G. M., the Objective or New-Type Examination, 
pp. 415, Scott, Foresman and Co., 1929. 


(12) Carlson, Paul A., The Measurement of Business 
Education, pp. 11, South-Western Publishing Co., 1932. 


a5). Ibid; pp. 12. 
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considered inferior to the other two. In this method, 
the same test is supposed to be repeated after an interval 
great enough to eliminate the memory effect and yet not 
long enough for much true growth in ability to take place. 
The question is: How judge this interval? It is logical 
to suppose that if any considerable period of time elapses 
a natural increase in ability will take place if normal 
classroom conditions obtain. On the other hand, if the 
test is repeated a very short time after it is given, the 
pupils will be bound to remember at least some of the 
answers that they recorded at the previous sitting. 
According to Odell, the product-moment method of 
correlation is generally considered the standard a ie 
It would probably be appropriate at this time to illustrate 
this method for the benefit of the reader. Suppose we 
consider the following test results from two equivalent 
forms of a test administered to the same group on success- 
ive days under identical testing conditions: 
An example involving the Calculation of the Coefficient of 
Correlation by the Product-Moment Method is given on the 


next page. 


(14) Odell, C. W., Educational Measurement in High School, 
pp. 588, The Century Co., 1930. 
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Professor Odell issues the warning that other infor- 


mation should be obtained about the reliability of a test 
besides the coefficient of correlation. He says, "Although 
the coefficient of reliability is probably the most fre- 
quently given measure of reliability, it is not very 
satisfactory because its interpretation depends largely on 


(15) 
the range of ability in the group tested." He suggests 


that other measures of reliability be used to supplement it. 
He proposes the following measures of reliability: "There 
are four measures of reliability that are commonly employed 
i connection with standardized tests. ‘hese are the 
coefficient of reliability (r), the standard or probable 
error of measurement (0 meas. or P. E. meas.), the ratio 
of this error to the mean (m), and the ratio to the 
standard deviation per eka is no doubt but what this 
additional information should enable the test expert to 
judge better the accuracy of the test under consideration. 
The drawback, however, is that the use of the mathematical 
formulae involved is beyond the ken of the ordinary class- 
room teacher and would have the effect of muddling rather 
than clarifying the issue. 

The next point that will be taken up aoe factors 


that influence reliability in a test. Symonds suggests a 


(15) Ibid, pp. 61. 
(16) Ibid, pp. 60. 


(17) Symonds, Percival M., Measurement in Secondary 
Education, pp. 289-295, the MacMillan Co., 19350. 
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1. Objectivity - A test paper is said to be objective if 

@ number of correcters, working individually, give it 
exactly the same score. If the judgment of the correcter 
enters into the determination of the score, then the test 

is called subjective. This subject will be treated in 
greater detail later in the thesis. 

2e Length of the test - This is a cogent factor in consid- 
ering the reliability of testis. Tests or examinations 
covering a certain unit of work have to contain a reasonable 
number of questions in order to cover adequately the 
instructional materials. A fuller development of this topic 
will be presented under the topic "Comprehensiveness", later 
in this thesis. 

Se Evenness of scaling - Care should be taken to include in 
the test items that differ from each other in degree of 
difficulty. From this standpoint, the test items should 
cover the entire range of difficulty. Odell says, "Another 
factor which influences reliability is the scaling or 
arrangement with respect to difficulty of the items or 
exercises in a test. A test which does not have a large 
number of very easy or very difficult items but has more 
items near the middle range of the ability of the group to 
be tested is, other things being equal, more reliable than 
one Which does not contain such items. Also a eh which is 


scaled in finer units tends to be more reliable." 


(18) Odell, C. W., Educational Measurement in High School, 
pp. 67, The Century Co., 1930. 
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Professor Symonds illustrated the inter-relation of 
reliability and scaling by a little example. He demonstrates 


by means of two figures, as follows: 


Figure 1 - First Test 


mm OM Od Od 
MMM HM OM 
Mom OM OP 
MoM oP OM 


Figure 2 - Second Test 
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He wishes to emphasize the point that the selection of items 
should cover gradations of ability. "By selecting the items 
in a certain way it is possible to construct a test of 
twenty items so that it has no more reliability than a test 
of less than ten EE A Oe in Figure 1 the first vertical 
series represents five easy items that everyone passes, and 
the last vertical row represents five very difficult items 
that everyone fails, then these items are practically 
useless in differentiating individuals, or for showing 
exactly what any individual's ability is. The reliability 
of the test, then, would be hit directly because of this 
faulty scaling. Now, consider the situation in Figure 2. 
Here the items cover the entire range of abilities contain- 
ing probably one or two so easy that all pass and, probably, 
One or two so difficult that no One gets a perfect score. 


Symonds concludes that: "If the items of a test are equally 


(19) Symonds, Percival M., Measurement in Secondary 
Education, pp. 290, the MacMillan Co., 1930. 
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spaced in difficulty as in the second figure, there is no 
such loss in reliability due to coarseness of the measuring 
SAGES s) 

4. Conditions of the pupils taking the test - Professor 
Symonds gives the reader to understand that while the 
condition of the pupil at the time of taking the test is 

a factor in its reliability, yet the pupil's condition 

does not merit as much attention as some writers have given 
it. He says, "Evidently the cause for test unreliability 
must be sought elsewhere than in the general condition of 
the individual. fThis does not mean that one should entirely 
discount these aa says, "The conditions under 
which tests are given also exercise some influence on the 
equivalence of scores although if reasonable precautions 
are taken this appears to be slight. For example, results 
Will probably agree slightly more closely if the two forms 
of a test are given at the same time of day and perhaps 
even if given on the same day of the week, but in most 
Situations this is not ae hay might cause a serious 
disagreement between results, however, if one form of the 
test is administered in the morning of one school day and 


the other form administered late in the afternoon of a 


Successive day just before the pupils are dismissed to 


(20) Ibid, pp. 291. 
(21) Ibid, pp. 295. 


(22) Odell, C. W., Educational Measurement in High School, 
pp. 68, The Century Co., 1930. 
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attend an athletic contest or some social event. To 
obtain approximately the same scores on two forms of a 
standard test, the identical testing conditions should 
be maintained. 
5. Familiarity of the pupils with the technique of 

taking tests - Many test experts now agree that pupils 
should be taught to take tests. This is particularly 
true in high school with pupils in the social-business 
studies. In eight years of teaching in the high schools 
of Massachusetts, New Jersey, and Connecticut, the writer 
has observed many cases where pupils were unable to do 
justice to a test because they did not understand the 
question in the objective test form into which it was cast. 
In a study that Professor Symonds made, he showed that much 
of test unreliability is due to lack of training in the 
technique of taking a test. In his conclusions, he says, 
"I would like to suggest the possiblity of lowering test 
unreliability by means of systematic training. Pupils 
should be taught to take ety gee pupils are un- 
familiar with the type of tests used, the scores at the 
first testing will in all probability not be as represent- 
ative of their ability as those secured later. 

Professor Odell calls attention to the wording of 


directions to pupils as affecting the reliability of a 


(23) Symonds, P. M., A Study of Extreme Cases of 
Unreliability, Journal of Educational Psychology, 
15:99-106, February, 1924. 
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test. Full directions should be given the pupils, either 
in the test or by the teacher, so that there will not be 
any doubt in their minds as to what is expected of them. 
He suggests the following points which the directions 
should provide for: 

"1. State briefly what the test is about. 

2. Instruct pupils when to begin and where to 
stop work, when to turn a page or not to turn 
@ page, and so forth. 

4e Direct pupils whether to delay on each item 
until they have answered it or to go ahead if 
they do not know it. 

4. Make clear the form of recording answers, 
Whether by writing words or numbers, under- 
lining, checking, or something else, including 
a fore-exercise to illustrate the method of 
response unless pupils are already thoroughly 
familiar with it." (24) 

What is a satisfactory degree of reliability? 

The answer to the question must remain a qualified 
One. Before any answer can be given, knowledge must be 
had of the individual test and the function or functions 
that it seeks to perform. Reliability coefficients are 
correlations and hence their magnitudes are influenced 
by the range of abilities present in the group of pupils 
used as the control group. Professor Odell says, "With 
thirty minutes of testing where fifty or more objective 
questions are asked one a Pag expect to get over .80 for 

25 

& reliability coefficient." 


(24) Odell, C. W., Educational Measurement in High School, 
pp. 67, The Century Co., 1930. 


(25) Ibid, pp. 299. 
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If the teacher desires to increase the reliability, it 
can be accomplished by increasing the testing time and 
including a larger number of questions. Teachers are 
coming to realize that in order to make testing worth- 
while, the test results should be reliable. If the test 
results are to be used for administrative or guidance 
purposes, the test should have a reliability of .90. 

Ruch and See ae formulated the following 
table to assist the classroom teacher in interpreting the 
reliability of a test: 

Reliability Coefficients 


0.95 to 0.99 Very high; rarely found among present tests. 

0.90 to 0.94 High; equaled by a few of the best tests. 

0.80 to 0.89 Fairly high; fairly adequate for individual 

measurement. 

0.70 to 0.79 Rather low; adequate for group measurement 
but not very satisfactory for individual 
measurement. 

Below 0.70 Low; entirely inadequate for individual 
measurement although useful for group averages 
and school surveys. 


The authors hesitated to make the above statements, but 
decided to because they felt that some concrete criteria 
was due the reader. 

Objectivity 

The practising teacher recognizes that it is desirable, 
nay indispensable, to mark all pupils on the same basis. 
Depression conditions with their corollary, increased pupil 
(26) Ruch, G. M., and Stoddard, George D., Tests and 


Measurements in High School Instruction, pp. 56, 
World Book Co., 1927. 
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loads, have caused the teacher to avail himself of the 
new-type test procedures in order to save time. Take, 
for example, an ordinary class of thirty or thirty-five 
pupils in Commercial Geography. A test of five essay 
questions in this study might merely consume thirty- 
minutes of the pupils’ time, yet the correcting of these 


papers by the teacher would represent a long, tedious 


Ghore.e Notwithstanding this, there is no surety that the 


papers will be graded on the same basis as the teacher's 
judgment enters into the marking. The teacher, being 
human, is subject to the common human frailties. The 
studies of Woods, Starch, Elliott, Kelley, and others 
prove that the traditional essay-type examination is 
almost impossible to mark with complete accuracy because 
of subjectivity. "An examination should eliminate or 
minimize subjective judgments in scoring, and the degree 
to which it does this is called its Rane incoaee 

The lack of objectivity in a test is one factor of 
unreliability which can most easily be remedied. Usually 
if the teacher spends a little more time in the construction 
of his new-type test, he can obtain the desired degree of 
Objectivity. Symonds says "Of all the factors entering 
into unreliability, lack of objectivity is perhaps the most 


inexcusable, for it is usually possible, by exercising 


(1) Lang, Albert R., Modern Methods in Yiritten Examina- 
tions, pp. 53, Houghton-Mifflin Co., 1930. 
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sufficient ingenuity, to turn a sudjective test into an 
Objective test with little or no loss of validity to the 
a ae 

That the objective test leaves much to be desired in 
the way of actual testing achievement is called attention 
to by many critics. ‘their argument is, in effect, that if 
the test is made completely objective it merely measures 
the acquisition of facts rather than other more desirable 
Outcomes of the educational process. Professor Brueckner 
says, “The scope of these tests should be broadened so as 
to include the outcomes of learning such as interests, 
appreciations, ability to apply, and the like which to many 
are even more oe; aa than the outcomes that we are now 
able to measure.” In an illuminating article on neglected 
aspects of educational measurements, Professor Uhl, after 
a brilliant survey of the problem, concludes that: "Measure- 
ments as usually administered fail signally to appraise 
certain of these forms of iia ey 

Objectivity in tests carried to extremes may encourage 
the development of dogmatism on the part of the pupils. 
Many times the pupils get the idea with odjective tests in 
Commercial Geography that a certain answer is right and 
that no other answer would possibly do. They think thus 
(2) Symonds, Percival M., Measurement in Secondary 

Education, pp. 290, The MacMillan Co., 19350. 
(3) Brueckner, Leo J., the Validity and Reliability of 

Educational Diagnoses, Journal of Educational Research, 

September, 1935, pp. 4. 


(4) Uhl, Willis L., Some Neglected Aspects of Educational 
Measurement, The Journal of Educational Research, Decem- 


ber, 19353, pp. 241. 
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notwithstanding the fact that the proposed substitute 


answer is merely the original answer paraphrased. That 
this tendency should be militated against by the teacher 
is obvious because, after all, the social-business subjects 
are intended to inculcate a liberal outlook. H. A. Jeep 
says, "Because of the impetus which objectivity has given 
dogmatism, the present day test often tends to block progress 
effectively along other lines of EE 
Other Desirable Characteristics 
I. Comprehensiveness 

It is doubtful if a pupil's total knowledge of any 
particular subject could be ascertained by existing test 
methods. If it were possible to secure this it is doubt- 
ful whether the information obtained would be commensurate 
with the time expended. Consequently, the situation 
resolves itself down to the query: "How obtain the requisite 
information about the achievement and educational develop- 
Ment of the individual pupil with a minimum of trouble?" 
The practical teacher takes advantage of the orinciple of 
Sampling and infers pupil' knowledge of the subject from 
the sample taken. It is patent, then, that the examination 
to be reliable must sample thoroughly. "It should cover a 
wide and representative scope, and the Sent to Which it 


does this is called its comprehensiveness.” 


(5) Jeep, H. A., Must Objective Tests be Dogmatic, Education- 
al Administration and Supervision, March, 1933, pp. 181. 


(1) Lang, Albert R., Modern Methods in Written Examinations, 
pp. 54, Houghton-Mifflin Co., 1930. 
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Sufficient questions should be included in the exam- 


ination so that all phases of the subject are adequately 
Sampled. For example, suppose that the Economics class has 
just completed a unit of work taking three weeks time. It 
is obvious, then, that a test requiring about five minutes 
would give an incomplete sample of achievement. Odell has 
this idea in mind when he says, "Both from the standpoint of 
the content of the test itself and of the reaction of the 
pupils, it is in accord with common experience and proven 
by actual experimentation that if the length of a test is 
increased up to a reasonable limit, its reliability is 
Dibes eae daacestcais. a@ point can be reached in the 
examination, however, when additonal items will have but 
little material influence upon the results. Professor 
Odell's rule regarding the length of tests is contained in 
the following statement: "It has been shown for similar 
tests, that is, tests covering the same subject or phase of 
a Subject and containing the same types of exercises, the 
reliability of a pupil's scores increases 2 hes as 
the square root of the increase in ‘t.* In other words, 
if one of two similar tests is twice as long as the other, 
its reliability is approximately 1.4 times as great. 

The examination should be divided off into fine 


Measuring units just as a yardstick is. It should not 


(2) Odell, C. W., Hducational Measurement in High School, 
pp. 66, The Century Co., 1950. 


(3) Ibid, pp. 66. 


contain many items so easy that everyone passes, or con- 
versely, so difficult that everyone fails. Odell says, 
"Another factor which influences reliability is the scaling 
or arrangement with respect to the difficulty of the items 
or exercises in a test. A test which does not have a 
large number of very easy or very difficult items but has 
more items near the middle range of the ability of the 
group to be tested is, other things being equal, more 
reliable than one Which does not contain such oe 
II. Facility 

The modern objective test to serve its full usefulness 
must be a time saver to both the teacher and the pupil. 
The greater amount of time spent by the teacher in the 
construction of a new-type examination is offset by the 
reduced scoring time. This should meke for greater teaching 
efficiency as the time saved might well go into better 
lesson preparation. The thought seems to be gaining 
momentum that the teacher's time is too valuable to be 
expended in the correction of poorly written material when 
the entire testing could be expedited by a test capable of 
being scored in a more or less mechanical manner. Lang 
says, "An examination should be easily administered and 
scored, and the degree to which aye this essential 


requisite is called it facility." 


(4). Ibid, pp. 67. 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 56, Houghton-Mifflin Co., 1930. 


» YLeetey 


= j-web §% 


f A *rnee 
tCe Ofe 


og eee on = 
efisaites 


re 
i O'S ais 


7’ 
“oy i 
fit DSt 


+? = r7r . 
de SFr Lei roet 
~———— oe -— oe 


> ZG 


The facility of an examination involves two other 


ideas, vize: necessity for definite instructions and the 
cost of preparation. Pupils taking the examination should 
be instructed as to what is expected of them in the 
examination. In general, these instructions should be 
contained right in the test rather than be given orally by 
the teacher. All of the better published objective exam- 
inations contain concise explanations together with fore- 
exercises in order to explain to the pupils the nature of 
the exercise that follows. Symonds says, "I would like to 
suggest the possiblity of lowering test unreliability by 
means of systematic training. Pupils should be taught to 
take | of the second idea, cost of 
preparation, it is obvious that a published test would 
lose its facility to the ordinary teacher if its cost were 
prohibitive. It is considered better with teacher-made 
tests for each pupil to have a mimeographed copy of the 
questions rather than to have them written upon the black- 
board. ‘The cost, then, must not pe allowed to become a 
Serious factor. 
III. Utility 
This characteristic has to do with the practical use 


to which the examination can be put. Hducational practice 


(2) Symonds, Percival M., Measurement in Secondary 
Education, pp. 295, the MacMillan Co., 1950. 
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has progressed long beyond the stage when teachers gave 
tests merely to provide “busy-work" for the pupils. 
Equally archaic, also, is the practice of some teachers to 
give an impromptu written test as soon as the supervisor 
or principal steps in the room to listen to the lesson. 
Such practices belong to other days and should be eliminated 
if they reappear. We can see, then, that the utility of a 
test to the individual teacher will depend upon the latter's 
education, teaching experience, and educational philosophy. 
The progressive teacher plans his testing program, with the 
assistance and advice of his supervisor, administers the tesis, 
scores and grades them expeditiously as possible, diagnoses 
the results for specific weaknesses, and then plans remedial 
instruction to cover the particular shortcomings that have 
been uncovered. Lang says, "The utility of a test is really 
concerned with educational diagnosis and with its adaptation 
for changing scores into meaningful kee 
IV. Rapport 

One's interest in a task has a great deal of bearing 
upon Whether it will be done or not. It is human nature for 
the pupil to do first the things that interest him and to 
do grudgingly and half-heartedly those things in which he 
lacks interest. No teacher can escape this situation today; 


it devolves upon every teacher to motivate his work so 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 59, Houghton-Mifflin Co., 1930. 
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that oe smouldering interest of the pupils is fanned into 
a bright flame. This attitude of the modern teacher 
should carry over particularly into the testing program. 
"An examination should create a feeling of interest and at- 
easeness, and rapport is the degree to which this is ae 
If the test is made as interesting as possible, the 
pupils will attack it with more zeal. Above everything the 
test should be planned so that the pupils will be satisfied 
with the fairness of the results. One of the beauties of 
the objective test is that arguments about the test mark 
are practically eliminated. An examination should begin 
with a few easy questions that all can answer. To do this 
insures the proper mental "set" when the pupil progresses 


in the test to the parts requiring more concentration. 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 59, Houghton-Mifflin Co., 1930. 
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E. STANDARDIZED TESTS versus INFORMAL, TEACHER-MADE TESTS 


The test expert usually illustrates what "standardized 
tests" are by explaining the differences that exist 
between them and the informal objective tests. Many times, 
the one becomes the other after a long period of seasoning 
and frequent revisions. We might say, under these conditions, 
that the standardized test is the informal, teacher-made 
test that has "graduated". Usually the standardized test 
represents a more scientific and accurate instrument than 
the teacher-made test; greater care has been given to its 
preparation; its validity and reliability have been insured 
by dee tnawias se statistical procedures. In degree of 
Ref ixoncnt. the standardized examination is to the informal, 
teacher-made test as a rapier made of the finest Milan 
steel is to the ordinary butcher knife. Care should be 
taken, however, to make sure that the "Standardized test” 
advertised as such is really entitled to be so described. 
Ruch says, "Many well-known standard tests are in fact 
fairly described as more-or-less objective examinations 
with naa 

A genuine standardized test must, however, meet much 
more stringent requirements than mere possession of norms. 
First of all, it should have "demonstrated validity rest- 


(2) 
ing upon some more secure basis than personal opinion." 


(1) Ruch, G. M., The Objective or New-Type Examination, 
pp. 158, Scott, Foresman and Co., 1929. 


(2) Ibid, pp. 138. 
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Its validation should be insured by the methods described 
previously in this thesis. All “dead timber" should have 
been eliminated, and the examination should really test 
what it is intended to. An instrument sufficiently refined 
to accomplish this must, of necessity, be the product of 
careful experimentation. A specific weakness of standard- 
ized tests is that they are not directly applicable to 
the local school situation but represent the generalized 
conditions prevailing in school circles. Ruch says, "The 
validity of most standard tests is open to discussion ; 
when we consider that local conditions vary so sgheessee 
Undoubtedly many standardized tests do compensate for not 
meeting the local school situation and can be used with 
little or no adaptation. The great majority, however, 
Should be viewed with considerable suspicion. The logical 
attitude for the teacher to assume is that neither the 
standardized nor the teacher-made objective tests should 
be paramount; that each has values the other lacks; and 
that the findings of one should be supplemented and tested 
by results from the other. 

Secondly, a “standardized test” should have "demon- 
strated Barents iat? anes requirement looks to the 


accuracy of the particular standard test as a measuring 
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(3) Ibid, pp. 140. 
(4) Ibid, pp. 138. 
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instrument. Professor Ruch issues the following note of 
Warning: "Unfortunately there are not a few standard 
tests, held rather generally in high repute, which yield 
reliabilities far below those ordinarily to be obtained 
by a thirty-to-fiftty-minute informal objective classroom 
test of the unstandardized ee os as use of sucha 
test might result in a gross mis-measurement of the pupils 
and it is doubtful if the teacher is justified in using 
it in spite of the availability of norms. Conceding that 
there are defects in the standard test, Professor Symonds 
states that: "Scores on standard tests are not absolutely 
accurate, but they are accurate enough so that we may 
place considerable confidence in test eG er is 
doubtful if Symonds intended this as a generic statement 
cOvering all classes of standard tests. What should be 
done is to examine the particular standard test included 
in the testing program to see if it meets the requirements 
of a "standardized" instrument. Otis' "Scale for Rating 
Tests" might be helpful in this capacity. 

The third requirement\ or a standardized test is that 
it must have “a reasonable degree of objectivity of Boilie: 
This requirement directs attention to one of the main 


differences between the standardized and the informal, 


teacher-made tests. As the former represents much more time 


(5) Ibid, pp. 139. 


(6) Symonds, Percival M., Measurement in Secondary 
Education, pp. 299, the MacMillan Co., 1950. 


(7) Ruch, G. M., the Objective or New-lype Examination, 
pp. 139, Scott, Foresman and Co., 1929. 


and effort in the making, it is a more polished tool, 
consequently it has greater objectivity than the teacher- 
made variety of test. Odell recognizes this when he says, 
"A second characteristic of classroom tests is that 
although they employ to a large extent the technics that 
make for objectivity in standard tests and are therefore 
much more objective than the traditional form of essay- 
type examination, they are far less objective than standard 
tests, since they depend on the teachers’ judgment for 
their (eM is sate to interpret from this state- 
ment alone that the standardized test really has intrinsic 
Values that prevent it from being displaced by the informal, 
teacher-made test no matter how popular the latter may be. 
The final requirement of a standardized test is that 
it have “norms or standards for evaluating the results 
Obtained by the re According to Professor Ruch, this 
requirement is not as important as the preceding ones. 
"The most important thing in ali this discussion of reason- 
"able standards of attainment is the recognition by test-users 
of the principle that no one norm of performance can be 
set a Which will have universal validity for all pupils or 
all ‘eT is generally being recognized now that the 


norms on a particular standard test, may or may not be of 


(8) Odell, C. W., Educational Measurement in High School, 
pp. 58, The Century Co., 1930. 


(9) Ruch, G. M., The Objective or New-Type Examination, 
pp. 159, Scott, Foresman and Co., 1929. 
(10) Ruch, G. M., and Stoddard, George D., Tests and 
Measurements in High School Instruction, pp. 17, 


World Book Co., 1927. 
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value to the teacher. Before any norm can be used, the 
conditions under which it was obtained must be considered. 
Obviously, the norm based upon the results of the standard 
test in city schools would not be applicable to performance 
in rural schools. Arguing in the same vein, a norm obtained 
from high-ability pupils could not be used in interpreting 
the scores ot low-apility pupils; nor would it necessarily 
be fair to compare the relative teaching abilities of two 
teachers by making their respective classes submit to a 
standard test and then comparing the test scores with the 
norm. Such an idiotic policy would be manifestly unfair 
because certain pedagogical methods stress drill and the 
acquisition of knowledge, while others emphasize the 
gaining of appreciations and the establishment of desirable 
attitudes and ideals. All norms must be interpreted in the 
light of the local teaching situation. How is it possible 
to circumvent this limitation? Some authorities have 
Suggested that while one norm is inadequate, many norms, 
each obtained under different conditions, might be the 
solution. Thus we should have a norm for results in rural 
schools, one for city schools, another for low-intelligence 
groups, yet another for high-intelligence groups, and so on 


until all the varied conditions are covered. 
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Some authorities differentiate between norms anda 


standards; others, use the terms as if they were synony- 
mous. Odell says, "It is unfortunately true that most 
authors and publishers of tests have not attempted to set 
up standards, but have merely reported norms and left the 
determination of standards to those using the ae 
norm represents the results of existing conditions in the 
schools surveyed; the standard, the final goal or ideal 
to be approximated or attained. 

In summary, the weight of opinion seems to be against 
the use of blanket norms because of the diverse conditions 
existing in the various school systems. Some authorities 
hold that a norm should be derived for each different set 
of conditions. Professor Ruch suggests that: "The con- 
structor of objective tests must seek other means of 
interpretation than through the use of norms. Local norms 
may be derived with the accumulation of records, and in 
the long run, interpretations may be made quite as accurate 
as practical demands CREA ge Other words, each school 
system may derive its own norms from the results of the 
standard test used over a period of time. 

The evidence given so far points to the inevitable 
conclusion that the standard test today is not a perfect 
(11) Odell, C. W., Educational Measurement in High School, 

pp. 452, The Century Co., 1930. 


(12) Ruch, G. M., The Objective or New-Type Examination, 
ppe 66, Scott, Foresman and Co., 1929. 
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measuring instrument. Does this mean, then, that the use 
of standard tests should be abolished? Even harsh critics 
would not suggest this. The solution is to include both 
informal, teacher-made and standard tests in the well- 
rounded testing program. Gale Smith says, "Everyone now 
realizes that standardized tests are indispensable in 
school work, but from the point of view of the classroom 
teacher, of the supervisor, and of the administrator, the 
greatest opportunities in testing today are in the use of 
new-type, objective tests which are not 
Odell concludes that: “One cannot avoid the conclusion 
that no testing program for a semester, year or other long 
unit of work in a subject can be well balanced unless it 
includes both standardized and non-standardized tests. 
Ordinarily, if not always, the number of the latter should 
exceed that of the former, their respective proportions 
depending partly on how satisfactory is the supply of 


(14) 
standard tests available in the subject being dealt with.” 


(13) Smith, Gale, How to Construct and Use Non-Standardized 
Objective Tests, pp. 8, The Benton Review Shop, 1929. 


(14) Odell, C. W., Educational Measurement in High School, 
pp. 4735, The Century Co., 1930. 
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F. FUNCTIONS OF TESTS 


It is futile to attempt to plan an examination until 
the function or functions to be served are clear in the 
teacher's mind. Lang says, "The function of any activity 
should be determined before the procedures and materials 
are DES OE each test or examination must be the 
test-maker's purpose for the individual test. Tests or 
examinations that are given merely for the sake of “busy 


work" are an educational crime. Many times teachers will 


give written work, supposedly as a test, and no sooner 


than the class has filed out of the room, will take the 
papers and relegate them to the waste basket. TYWhen 
Ghallenged about this practice, they offer the lame excuse 
that the pupils got the benefit of organizing and writing 
down their thoughts on paper anyway. It is doubtful if 
the giving of "tests" to keep the pupils occupied for the 
period is a justifiable procedure in the light of modern 
educational theory, and it is certain that such practice 
would be severely censured if it came to the attention of 
the state supervisors. 

Just as each course of study has aims and objectives 
to fulfill, so each examination has its function or 


functions as its underlying basis. ‘The function of an 
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(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 20, Houghton-Mifflin Co., 1930. 
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examination may be likened to the tiller of a sailboat; 
both give point or direction. The thought prevailing 
in the modern test movement is that extra time spent in 
the planning and actual drafting f the examination is 
more than compensated for in the correction and interpre- 
tation of results. Unfortunately, there is no agreement 
among educational authorities as to the functions that an 
examination is supposed to serve. Lang says, "In any 
Glassification of functions of this kind there will be 
more or less piabicosina tee Each examination is almost 
sure to inciude more than one function. For example, a 
test aiming to determine the achievement status of the pupils 
will also have some diagnostic value, although, of course, | 
not near so much as if the test were intended primarily to 
aim at diagnosis. The following nine examination functions 
are suggested by Professor Lang: 

I. Testing Retention of Information 

Teachers, as a rule, have difficulty in testing in the 

social-business studies; hence, the need for such a study 
as this. All too often the tendency for the classroom 
teacher is to overrate the amount of information the pupil 
has retained after his study of a given unit of work. 
Lang says, "Teachers and students alike are inclined to 
Overestimate the mastery of subject-matter which has been 


4 (3) 
studied." 


(3) Inia, pp. 22. 


The meagre retention of information on the part of the 
pupils after so much supplementary reading and explanation 
is a source of concern to the teacher in the secondary 
school. This problem can be solved in part, at least, by 
giving unit tests systematically and then planning remedial 
instruction on the basis of the results shown. The thought 
that nothing short of complete pupil mastery should be 

the goal has been advocated in certain quarters. The chief 
exponent of this school of thought suggests the following 
"mastery" formula: "Pretest, teach, test the result, adapt 
procedure, teach and test again to the point of actual 
Wesee® grr is generally conceded that there is genuine 
merit to this formula. Whether pupil mastery is the goal, 
Or Whether the teacher is satisfied with results short of 
complete mastery, the efficient teacher must have a definite 
planned program of unit tests combined with a follow-up of 
diagnosis and remedial instruction. 

The results from the check-up test will show, occasion- 
ally, results that are quite alarming to the teacher, i.e., 
that the instruction on a certain unit of work has been 
entirely ineffective. Many times such a condition arises 
at the beginning of the year's work with groups that the 


teacher has not as yet been able to size up. If the school 


= oe ee ee we oe ee ee Se ee ee es ee em ee Oe re ee em Se ce ee ee ee ee ee ee eS ee oe 


(4) Morrison, Henry C., The Practice of Teaching in the 
Secondary Schools, pp. 79, University of Chicago 
Press, 1926. 
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employs a psychologist, it would be a good idea to talk 
the problem over with him. He might advise that suitable 
psychological examinations be administered to ascertain the 
intelligence of the group. Moreover, because of the 
psychologist's past training, he could help the teacher 
adjust the pedagogical methods in order to meet the needs 
of this special group. If certain individuals then do not 
respond to this modified instruction, case studies must be 
made of them. 
II. Determination of Achievement Status 

So far we have concerned ourselves with the test 
covering a unit of work. In most cases, there will be the 
further necessity of determining the pupils’ complete 
Status in a subject from time to time. Perhaps the distinc- 
tion between the two types of tests may be made clear if 
the first is called a "check-up" test and the second,.an 
"inventory test". The former is intended to measure the 
retention of information relative to a unit of work; the 
latter emphasizes all that has been attained up to a given 
time in a subject of study. 

It is evident from the foregoing explanation that 
this second function of examinations is one of the most 
important; in fact, according to some educational authori- 


ties, the most important examination function. Professor 


Ruch holds that: "The measurement of er ak been 
5) 


ev 


admittedly the principal reason for examinations." 


Examinations adminstered to determine the achievement status 
of pupils in a certain subject or in an entire school 
system have sometimes unearthed startling deficiencies in 
the present educational program. These results have been 
seized upon by carping critics of the secondary schools 
who have been delighted to herald them far and wide. Lang 
says, "It is from the standpoint of determining achievement 
status that examinations have been submitted to the most 
severe censure and the most unfriendly sbtadigie® 

The results from achievement tests are not only in- 
Valuable to the classroom teacher in judging the effective- 
ness of his instruction, but also of extreme value to the 
executives in bringing about the proper orientation of 
pupils. Suppose that a pupil transfers from a small, 
unknown secondary school to a large, up-to-date city high 
school. Needless to say, a question would arise as to 
whether the pupil's development was commensurate with the 
new strain to which he would be subjected. The unaided 
judgment of the executives alone would be unreliable in 
the grade-classification of such a pupil. This dilemma 
could be solved by testing the pupil as to his development 


in the new subjects for which he has enrolled. If he falls 


(5) Ruch, G. M., The Objective or New-Type Examination, 
pp. 16, Scott, Foresman and Co., 1929. 


(6) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 26, Houghton-Mifflin Co., 1930. 
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down markedly in any of the tests, it is safe to assume 


that he has an insufficient background in that subject or 
subjects and his educational guidance will have to be 
varied accordingly. The above conclusion, of course, is 
predicated upon the premise that searching "inventory" 
tests are administered in the subjects under question. 
The other uses of achievement tests are legion. 

Another type of achievement test that has received 
wide publicity in recent years is the "pre-test". Lang 
says, “Many teachers now give a pre-test before assigning 
a& new unit of ares discussing the need of a testing 
program, Professor R. G. Walters says, “Achievement tests 
Should do three things: discover what the student knows; 
how well he understands what he knows; and how well he can 
apply what he knows. In other words the achievement tests 
Should consist of fact, thought, and application ileal 
After the teacher gives a pre-test on a new unit of work, 
he obtains the information necessary to plan his development 
of the subject. 

Needless drill on the phases of the unit with which 
ali the pupils are familiar would not only be deadening to 


pupil interest but also be an insensate squandering of 


time. The Morrison Unit Plan presupposes the giving of a 


(Fi: Ibid, pp. 27. 


(8) Walters, R. G., Modern Methods of Teaching Commercial 
Subjects, Monograph No. 16, pp. 23, South-Western 
Publishing Co., 1932. 
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pre-test in the "exploratory" stage of a unit for the 
reason mentioned above. The main values of the pre-test 
is that it indicates the foundational preparation of the 
pupils for a new unit of work. 

The general survey test is yet another achievement 
test that enables school work to be evaluated. This type 
of test frequently embraces the schools in an entire 
system. The Rochester School Survey in Senior High School 
Social Studies may be cited in this connection. This 
Survey, Starting in September, 1925, extended over a period 
of two yearse A number of valuable conclusions were 
arrived at as a result of this cooperative testing movement. 
"The starting of the survey work really put under way six 
important movements in the field or the senior high school 
social science: re-weighing and clarifying of objectives; 
re-organization of subject-matter in the light of these 
Clearer Objectives; re-organization of classroom procedure 
for the betterment of the chosen aims; better articulation 
between departments both in the same schools, and among the 
different schools, and between different school levels; 
greater knowledge of how to test scientifically, and to 
compute and use statistical resulis; finally, the establish- 


(9) 
ment of more scientific remedial work." It is evident that 


(9) Gibbons, A. N., Tests in the Social Studies. A 
Record of a esting Experience in the Senior High 
School Social Studies. National Council for Social 
Studies, pp. 7, Athens Press, 1929. 


0 


cooperative testing program was planned carefully by 


committees of teachers. The first year was to be given 
over to study and experimentation in the individual 
schools; the second, to the administration of a series 
of uniform city-wide survey tests. Mr. Gibdbons 
Summarizes the values from the first year thus: "Possibly 
the greatest value that came from that first year concen- 
trated upon experimental factual testing was the unanimous 
conviction that the objective of factual mastery must be 
subordinated to higher aa le aes general values 
accruing from a self-survey of the Rochester type are 
bound to react toward a raising of professional teaching 
standards through-out the system. 

That the achievement survey has vital significance 
to the teaching staff is the contention of most authorities. 
Professor Van Wagenen maintains the following: "The most 
important purpose of an achievement survey probably consists 
in acquainting the teachers with the actual educational 
conditions existing in the school system, the realization 
of its strong and weak points as well as its present 
standards of Mesa 

III. Stimulation of Daily Work 


Motivation is a term that looms large in the profes-— 


Sional teaching literature of the day. A lesson properly 


(20) bid, pp..19. 


(11) Van Wagenen, M. J., Educational Diagnosis and the 
Measurement of School Achievement, pp. 224, 


the MacMillan Co., 1926. 


eee 


motivated will go over with a minimum of troubdle, and, 


conversely, one lacking in motivation may be fraught 
with petty annoyances or even serious discipline problems. 
How to get this all-important factor marks the difference 
between an excellent teacher and a mediocre one. The 
ideal situation would be where the pupils are stimulated 
to prepare their lessons adequately because of real joy 
derived from the work itself, or because there is a 
realization of its probable usefulness. Lang says, "As 
every teacher knows, however, ideal and natural motives 
do not always make a strong enough appeal to students to 
stimulate adequately a satisfactory type of daily prepara- 
Ce ns this is true, the efficient teacher many times 
will have to resort to extrinsic motivation. At all 
times the motivation should be kept as closely associated 
with the activity itself as possible. As we know, the 
incentive for study on the part of the pupils should be 
intrinsic in nature, viz., should arise from the interest 
and joy the pupil has in his studies; lacking this, the 
promise of a test or an examination, even though not so 
perfect a motivation, should produce the desired result of 
adequate lesson preparation. 

Tests and examinations are effective means of stim- 


ulating school work. Pupils in the secondary school are 


(12) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 27, Houghton-Mifflin Co., 1950. 
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already familiar with tests and examinations of all types 
because or their previous educational experiences in the 
lower schools. Professor Ruch says, "That examinations do 
have this value has been tacitly agreed but never proved. 
In spite of this dearth of proved fact, it does seem 
reasonable to suppose that pupils strive for somewhat 


greater and somewhat more permanent mastery when they 


realize that searching examinations may be expected at a 


(13) 
later date." The test or examination could be increased 


in value as a motivator if certain cardinal rules and 
principles were constantly borne in mind. The first rule 
is that pupils shonid have knowledge of the progress they 
are making. If a test or examination is given the ordinary 
class, the pupils are on "pins and needles” to learn what 
score or grade they obtained. In order to utilize this 
interest, the test should be corrected and scored while 
this feeling is at a high pitch. This is the time when 
remedial work will reap its richest harvest. This is a 
potent argument for the giving of short daily tests, having 
the pupils exchange papers, and having the correcting done 
by the pupils themselves. If the teacher then calls for 

@ show of hands as to the items missed, he has a good 


basis for a remedial lesson with the pupils keyed to the 


SSS Ss SSeS SSS Se See SOS SSS eS SS SE SE Ee Oe SSE SE SSE SS eS SS ee ee ee oe 


(13) Ruch, G. M., The Objective or New-Type Examination, 
pp. 10, Scott, Foresman and Co., 1929. 


proper degree of receptiveness. Carlson argues that: 


"Experimental psychology has repeatedly demonstrated that 
pupils do much better when they are kept informed of the 
results than when they are kept in ignorance of their 
degree or SAT experimenting with his own classes, 
the writer has found that the posting of test scores on 
the bulletin board in the form of a barograph has aroused 
great interest. The length of each barograph indicates 
the total accumulated score for each pupil. 

The second rule for increasing the motivating power 
of tests and examinations is that they should come at 
frequent intervals - a planned sequence should be arranged. 
To give long tests infrequently is to delay the day of 
reckoning so long that the pupils are not stimulated. 
Carlson suggests that: "A twenty-five question, short- 
answer test need not consume more than five minutes of 
the class period for writing and ten minutes for correction 
and tabulation of beasbial tla should be evident that the 
giving of such a short test would dispense with the need 
for a great deal of oral quizzing on the part of the 
teacher. 

That pupils may be made to realize the help they can 
Obtain by using tests of a detailed specific, and diagnos- 


tic character is a third important rule. At the outset, 


(14) Carlson, Paul A., The Measurement of Business 
Education, pp. 8, South-Western Publishing Co., 1932. 


(15)..Ibid, pp. 8. 
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the pupils wili regard diagnostic tests as drudgery, but 


their viewpoint will change when they begin to realize 
the value of these tests. 
IV. Motivation of Reviews 

The frequent review is an integral part of good 
teaching. The teacher who fails to allow sufficient time 
for review work is committing a grave error since he is 
not taking into consideration the psychological law of 
forgetting. Lang says, “Learning no sooner takes place 
than the process of forgetting sets in. The tendency to 
forget is a natural process and a fortunate one. Many 
trivial things and much that is not true are learned. It 
would be a great handicap to learning if the mind were 
permanently cluttered up with such 6 A ae 
getting is a good thing because the mind is cleared of 
Many untruths that have been learned, and, further, 
because it makes possible the selection and organization 
of the particular material that it is desirable to make a 
permanent possession. 

The examination is an effective means for the motiva- 
tion of review work. The proper kind of review requires 
the pupil to go over his material carefully, sifting out 
the salient points from the unimportant, and then organiz- 


ing them in some sort of usable outline. Such a review 
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(16) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 30, Houghton-Mifflin Co., 19350. 
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entails a critical and deliberate evaluation of the 


material. It is not to be confused with a hasty and 
Superficial type of mental activity known as cramming. 
Ve Provision of Objective Standards 

How many times have teachers complained that their 
classes this year are much poorer than those of preceding 
years? All too often such an assertion is made to cover 
up poor teaching on the part of the teacher. In most 
cases, the teacher bemoaning the poor abilities of his 
pupils, is drawing this conclusion from his own experiences. 
Such an unscientific judgment may or may not be true, with 
the possibilities being against it. The teacher usually 
falls back upon general impressions; such generalities 
are more or less meaningless. To make a real comparison 
of the work from year to pour) eegeetive tests would have 
to be administered to each group in turn, and the results 
compared with the norms or standards that have been derived. 
Any judgment rendered then would be founded on actual 
achievement as determined by suitable objective tests. 

The accumulation of examination scores over a period 
of years on the various tests in a well-rounded testing 
program are a great aid to the teacher in judging the 


effectiveness of his instruction. Occasionally a group 
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will be met with that falls far below the established 


Standard. Such a situation is really a challenge as 
Ordinary instructional methods will be ineffective. 
The fact that this class has failed to measure up to 
the standard is an index that the entire course will 
have to be revamped in order to meet their needs. If 
the teacher did not early discover that this group was 
Sub-normal, he might blame himself for poor teaching 
When the standard pedagogical methods failed. 

4n attitude of complete slavishness to norms or 
standards is not a good thing. As has been shown else- 
where in eis thesis, the method of deriving the norm 
must be examined critically. When a test with norms is 
used, the teacher must emphasize the points that appear 
in it otherwise the pupils will not attain the norm level. 
In some instances, tests do include obsolete material. 
Lang concludes that: “The use of standardized test norms 
as Objectives tends to perpetuate obsolete material and 
to fix a traditional school pedicccudiat, 

VI. Measurement of Teaching Efficiency 

The teachers and supervisors both realize that teach- 
ing efficiency must be gauged if the vocation is to be 
really put on a professional basis. The question still 


persists as to how this can be accomplished. Protagonists 


Poa. Seid, pp. SS. 


of the new-type test immediately proclaim that their 


teaching procedures and methods provide the needed succor. 
At any rate, it is now generally agreed that the old 
impressionistic methods of rating the teacher after a 
five minute sampling of his work should be consigned to 
the museum of educational curios, never again to be 
Salvaged for use. Such methods were astounding in their 
finality; unfortunately, they still persist in our 
educational structure. Subjective ratings of any kind 
are to be regarded with suspicion; and in the case of a 
rating based on a sampling of five minutes’ work, they are 
doubly so. Something unforseen might arise during the 
sampling period that might be entirely unrepresentative 
of the real work being accomplished in the course. If the 
Se estacr placed much significance in the sample, it would 
result in a grave injustice to the teacher and a direct 
blow at his professional standing. Supervision should be 
conducted systematically if it is to serve its main 
purpose, viz., the improvement of teaching efficiency. 
Does the above indictment of subjective teacher- 
rating methods mean that there is to be no supervision? 
Positively not: There is a real need for the right kind 
of supervision. A supervisor to attain his full degree 


of usefulness must drop the role of. being a critic or 
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stern judge and assume the part of a friendly counselor 


who by his past educational background and experience is 

qualified to offer guidance to the teacher of such a 

nature that the latter's efforts will be more successful. 
The problem of evaluating the efficiency of the 

teacher by tests of his pupils’ accomplishments is worthy 

of serious consideration. Ruch says, "It was on this 

point that Dr. J. M. Rice drew so much fire from the 

National Education Association at the beginning of the 

eee a, prevalent thought was that the results of 

standard tests administered to the classes of different 

teachers would indicate relative teaching efficiencies. 

This theory assumed that high accomplishment on standard ; , 

tests indicated effective teaching, and low accomplishment, - 

per se, unsatisfactory teaching. The weaknesses in this 

point of view were soon exposed. Ruch holds that: "The 

standard test method made no allowances for differences in 

pupils’ mental equipment, the most important single factor 

controlling the rate of learning yet cea” 4 standard 

test purporting to test pupil accomplishment in Commercial 

Law would, of necessity, yield lower scores with a low- 


ability group than with a high-ability one. Relative 


teaching efficiencies of two teachers can be determined 


(18) Ruch, G. M., The Objective or New-Type Examination, 
pp. 12, Scott, Foresman and Co., 1929. 
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only if they are working with groups of about the same 


ability. It is only after the school psychologist has 
proved that two groups have approximately the same mental 
abilities that scores on standard tests can be compared 
fairly. Then, again, it must be realized that standard 
tests are not entirely adaptable to local conditions, 
and that they are open to abuses through coaching. For 
that reason, it is probable that the Commercial Department 
Head should take advantage of locally constructed tests 
in solving the supervision problem. Professor Ruch con- 
Gludes that: "Where objectives and aims can be translated 
into concrete test situations, supervision through locally 
constructed tests is far more economical than personal 
ae, 
VII. Improvement of Teaching Efficiency 

Because a teacher's professional experience extends 
Over many years does not necessarily prove that the 
teacher need not concern himself with improving his 
efficiency. It is possible that this extended period of 
Service has merely resulted in fixing habits that were 
faulty in respect to sound educational practice. Even 
if the teacher is thoroughly efficient, he must keep 


abreast of changes in the field, and adapt his methods to 


Ieet the changing situation. Usually this can ve 


(20) Ibid, pp. 13. 


accomplished by three means, viz.: reading the best 


professional magazines, taking courses in the graduate 
schools of great universities, and attending teaching 
institutes and conventions. One of the momentous 
problems that new superintendents face in coming into a 
school system is how to improve the teacher in service. 
The expression that the teacher "has gone to seed" is used 
quite a lot in teaching circles. This type of individual 
must be helped to find himself, or he must be eliminated 
from the system. It is a moot point whether teacher 
tenure does not build up in the teacher the false outlook 
that he does not have to concern himself with self- 
improvement as his position is secure. Lang says, 
"Improvement of efficiency through-out the teaching career 
Should be the constant concern of every member of the 
IS a 

It, is only natural that the superintendent should be 
concerned with the problem of supervising the teacher for 
the purpose of helping the latter to gain greater teaching 
effectiveness. Supervision is of such great importance 
that it must be carried on systematically and with a 
minimum of friction. Professor Van Wagenen says, "The 
effectiveness of supervision in a school system is 


dependent on several important factors: the confidence 


(21) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 37, Houghton-Mifflin Co., 1950. 
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of the teacher in the superintendent's or supervisor's 
professional integrity, his range of professional 
information, and his ability to show the teacher what 
needs be nae The superintendent's efforts will be 
abortive until he gains the goodwill and confidence of 
his stafi. Changes suggested by the superintendent as 
a result of his class visitations should be reasonable 
and based on logic apparent to the teacher. 

In his systematic visitations, the superintendent 
Or Supervisor will have to use some teacher-rating method. 
Alberty and Thayer say, "The motive for introducing teacher- 
rating schemes is clear. It is to substitute an objective 
and accurate appraisal of teaching success for the old 
method of general ee aac Suggest three general 
types of teacher-rating plans: "score cards of teacher 
traits, man-to-man comparison scales, and measurements of 
teacher efficiency based upon ae tke score 
card is probably the most widely followed plan for rating 
teachers. It consists of a number of traits that are 
essential in good teaching, listed so that each one can 
be appraised separately. Needless to say, a supervisor 
must have long experience in rating teaching before he can 
use the score card or man-to-man comparison scales with the 
(22) Van Wagenen, M. J., Educational Diagnosis and The 

Measurement of School Achievement, pp. 68, 


the MacMillan Co., 1926. 


(23) Alberty, H. R., and Thayer, V. T., Supervision in 
the Secondary < eheaee 142, De C. Beath & Co., 1931. 


(24) Ibid, pp. 143. 
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proper degree of success. The third plan, the measurement 


of teaching effectiveness by the use of achievement tests, 
is of prime importance when viewed in the light of the 
modern test movement. The present writer contends that 
Supervision would be greatly facilitated by the setting 

up in the secondary school of a well-rounded testing program 
consisting of both teacher-made and standardized objective 
tests. The results of these tests, carefully interpreted, 
should prove a boon to the busy superintendent. 

That examinations afford great possibilities for 
improving teaching efficiency is now being recognized by 
the entire teaching profession. There are a number of 
ramifications of this thought. In the first place examina- 
tions are conducive to teaching efficiency since they 
supply the teacher with a knowledge of a student's 
achievement-status, and a detter understanding of his 
shortcomings and difficulties. With this information, the 
teacher is better able to adapt his work to pupil needs 
and interests. Secondly, the construction of a good exam- 
ination compels a teacher to determine the objectives of 
the course and of the different units of work, and to 
organize the subject-matter so that these objectives will 
be attained. All this presupposes thorough study by the 


teacher of the instructional materials and should result 


in a greater thoroughness of preparation. Gale Smith 


says, "The formulation of odjective tests compels a 
teacher to study, to organize, and to plan her work to 
the most minute A sai. aia ort an intimate knowledge of 
the subject-matter is almost certain to result in improved 
teacher presentation. 
The advantages of "unit teaching" have been extolled 
by many leading authorities. Gale Smith says, "For 
Objective tests to be most effective as a supervisory agency, 
the subject-matter being taught must be broken up into 
SMAll UNitSeccescoceecceeeee Any Sudject naturally divides 
itself into units corresponding to the different topics 
and sub-topics which it includes. This natural division of 
subject-matter should be the foundation for the work se 
It is always hard for an inexperienced teacher to plan the 
units of work. As this bears upon the efficiency of 
teaching, it is the supervisor's duty to assist in the 
laying out of these teaching units. Until this is done, no 
testing program worthy of consideration can possibly result. 
In supervisory work, provision should be made for the 
Supervision of testing. In some school systems with which 
the present writer has been connected, and in some that he 
has eet ae there is no supervision of the tests that the 
(25) Smith, Gale, How to Construct and Use Non-Standardized 
Objective Tests, pp. 114, The Benton Review Shop, 1929. 
$84) Tbid, pp. 115. 


individual teacher gives. After all, the supervisory 


program embraces the supervision of testing as well as 
the supervision of teaching. Hildreth calls attention 
to this in the following statement: "Permitting teachers 
to choose and use any tests indiscriminately is indefens- 
ible and may result in great economic waste to the school. 
Supervision of testing is as important as supervision of 
instruction........eee- Although in the best schools 
opportunity is always allowed for the exercise of the 
teacher's discretion in such matters, there is always some 
Supervision to relate the work of each particular teacher 
to the activities of the whole school, thus insuring proper 
integration and uniformity in the Ek 

VIII. Diagnosis of Special Difficulties 

Diagnostic testing and remedial teaching are the two 

pillars of strength that support the whole superstructure 
of classroom instruction. Effective teaching requires the 
diagnosis of special difficulties in learning. In this 
respect, the work of a teacher is much like that of a 
Physician. The former must know the characteristics of 
learning difficulties and their remedies; the latter, the 
Sieh niee of common diseases and their cures. Just as the 
skilful practitioner probes his patient's ailment, so too 


the resourceful teacher critically scrutinizes his pupil's 


(27) Hildreth, Gertrude H., Psychological Service for 
‘ School Problems, pp. 64, World Book Co., 1930. 


- 
’ tg 
' 
~~ 
7 
J 
Od 
- 
ar 
Wa 
e! 
> 


aye. 


difficulty in order to determine the reason for its exist- 


encee No competent physician would attempt to cure an 
ailment before he was reasonably certain what he was up 
against. Similarly, no progressive teacher would prescribe 
mental pills for his pupils until he had sized up the 
situation. " The first act in the teaching process should 
be diagnostic testing, in order that the teacher may know 
what to peer ee test is called a pre-test by many 
educational authorities. Professor Brueckner says, “The 
theory back of this procedure is the same as that back of 
all diagnosis, namely, the orientation of teaching Sedan 
The results from a diagnostic test not only indicates 
the progress a class is making but they also show which 
Members of the class are not profiting by the instruction. 
This information is of great importance to the teacher. If 
the scores of the test are unusually low for all the students, 
it is a pretty good index that the instructional methods 
being used are not suitable. If the scores indicate, how- 
Special study must be made of the pupils obtaining these 
low scores in order to find out the trouble. Gale Smith 
Suggests the following method of summarizing the results in 
a teacher-made objective test. Let us take as an illustration 
the scores on a Commercial Geography test administered in 
Stamford High School. 


(28) Carlson, Paul A., The Measurement of Business 
Education, pp. 7, South-Western Publishing Co., 19%c. 
(29) Brueckner, Leo J., and Melby, Ernest 0., Diagnostic 
- @nd Remedial Yeaching, pp. 451, Houghton-Mifflin Co., 1931. 


o yidgnoeses acd ocd 


ideriae fon’ 8 
s five 


4 ia 
- 2 tine 


a Ps -—— Or get as an GAY ay ae wen ated 
tank to trea jeteaseH: oh vi & 
. as idod, ngetasBen spcB. 3 

tmonse...% étold | as, 


The summarized results given on the diagnostic sheet 


represent the scores obtained by a class of twenty-eight 
pupils. Even a cursory examination of the results will 
indicate that the questions represent varying degrees of 
difficulty. For example, questions five, eight, fifteen, 
and twenty-nine were hard for the pupils as shown by the 
large number of failures. This is good evidence that the 
teacher should reteach these particular points in his 
next lesson. 

It is quite probable that some teachers would object 
to the use of the Analysis Sheet that Gale Smith proposes 
on the grounds that it is too time consuming. The prepara- 
tion of such a chart does not take much time, but it might 
take more than can be allowed by some teachers. If so, the 
best procedure is to dispense with it and to ootain the 
needed information by other means. Some teachers do the 
work in the following manner. First, they dictate the 
diagnostic test allowing the pupils sufficient time to 
answer each question. Next they direct the pupils to ex- 
Ghange papers. The teacher then reads off the correct 
answers. After this, the pupil counts up the number of 


items correct and indicates the number on the top of the 


test; usually, he initials the test that he has just corrected. 


When this is done, the teacher calls for a showing of hands 
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as to the errors on each test item, and the total is written 
on the front board after the number of the item. Now, with 
the information before them, the class are able to attack 
the items that were missed. Professor Carlson says, "Unless 
the results of each test are followed up, they are not 
worth the time and trouble which they heres 

The tendency toward the use of the Morrison "Mastery 
Formula" seems to be gaining.momentum. From the standpoint 
of waste elimination, this is a healthy trend. Apropos of 
this, Gale Smith says, "Recent research has tended to 
indicate, beyond doubt, that there has been entirely too 
much needless repetition of teaching. It is not necessary 
to teach again what the pupils already know. Our greatest 
waste in teaching comes, not on account of teaching retarded 
pupils who need it but on account of reteaching those who 
could be accelerexted, and who do not need repetition. The 
method of testing, diagnosis, reteaching, follow-up work 
and retesting which is suggested will conserve time and 
reduce waste in teaching egaien 

In addition to the diagnostic work on ordinary tests, 
it is sometimes advisable to give tests that are avowedly 
even more diagnostic in nature. These should be focused 


on the different units of work where weaknesses crop up. 
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(30) Carlson, Paul A., The Measurement of Business 
Education, pp. 7, South-Western Publishing Co., 1932. 


(31) Smith, Gale, How to Construct and Use Non-Standardized 
Objective Tests, pp. 113, The Benton Review Shop, 1929. 
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Perhaps a distinction should be made vetween the 


survey test and the diagnostic test at this point. The 
first indicates what has not been learned; the second, why 
it has not been learned. Odell says, "The tests employed, 
whether standardized or home-made, should in so far as 
possible not merely show what errors pupils make or what 
gaps there are in their knowledge, but why the errors are 
made or the gaps serue ne s. diagnostic test will be, of 
necessity, much more detailed then the survey test. In 
most diagnostic tests several questions or exercises are 
inserted that deal with the same facts or processes on the 
theory that the pupil's deficiencies will be shown up more 
Clearly. Odell says, “Another quality of satisfactory 
diagnostic tests is that they should frequently contain 
several questions or exercises dealing with the same facts 
Or processes. The purpose of this is that teachers may 
know from the results obtained whether or not pupils really 
know or do not know the points daboteeae 

When the teacher comes in contact with an obdviously 
mal-adjusted pupil, he must take advantage of his professional 
experience in order to ascertain the exact nature of the 


trouble. It is right at this point that the inexperienced 


teacher falls down. An experienced teacher knows much more 


(32) Odell, C. W., Educational Measurement in High School, 
pp. 550, the Century Co., 19350. 


(33) Ibid, pp. 550. 
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about the physical causes of poor school work than his less- 


experienced colleague. Often times a teacher with little 
or no professional training is prone to brand a failing 
pupil as moronic because the latter does not learn readily. 
In reality, the cause of the pupil's poor progress might be 
due to something entirely remote. Hildreth says, “McCall 
has found the chief causes of many of the learning difficult- 
ies of school children to be insufficient practice, improper 
methods of work, deficiency in fundamental skills, absence 
of interest, physical defects, and subnormal ee siicake 
If the pupil's learning difficulties are so deep-seated that 
they are not discovered by the Mehisars diagnostic methods 
employed by the teacher, special clinical methods will have 
to be used and the aid of the school psychologist enlisted. 
Little has been said so far about specific diagnostic 
testing methods in the social-business subjects. Little 
can be said, really, because of the limited sources of 
information available. What has been written about diagnos- 
tic testing applies to all subjects with the same force as 
to the social=-business group. Diagnostic testing and 
remedial teaching are harder with the social subjects than 
With other subjects. Brueckner and Melby say, "It is 
evident from the foregoing discussion that remedial work 


(34) Hildreth, Gertrude H., Psychological Service for 
School Problems, pp. 147, World Book Co., 1930. 
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in the social studies becomes_a more difficult and less 


exact procedure than would be the case in a subject such 

as be tadtictt cto all, the difficult thing in relation 
to teaching subjects in the social-study group is that there 
is no agreement among authorities as to the facts to be 
taught. Brueckner and Melby point out that: "It is 
readily apparent that the test-maker faces a more or less 
baffling secagvaea in this field. In the first place there 
is no complete agreement on the facts to be nddedine! 
This visits quite a hardship on the teacher as he must make 
the decision as to what facts should be included. Here, 
again, an immature teacher is at a decided handicap as he 
has not sufficient background to judge between the relative 
merits of different teaching materials. 

More and more the use of varied teaching materials is 
being stressed in the social-business subjects. It is 
considered sound teaching procedure now to require the 
pupils to consult many sources of information other than 
their own textbook. After they get accustomed to this 
library research work, the pupils really like it. A great 
many pupils develope a real love for books as a result of 
this preliminary training. Kimmel says, "The program of 


Wider reading in the social studies depends largely upon the 


degree to which teachers are successful in arousing the 


(35) Brueckner, Leo J. and Melby, Ernest 0., Diagnostic 
and Remedial Teaching, pp. 475, Houghton-Mifflin Co.,1931. 


(36) Ibid, pp. 448. 
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interest of pupils in reading...e.eeeeee2 The cultivation 


of a taste for worthwhile books is one of the most important 
goals of the efforts of teachers of the social tealaeua 
It is in reference to the reading program that the 
teacher in the social-business subjects is thrown upon his 
mettle. How to discover and single out the pupils that 
employ faulty reading methods is a problem with which the 
progressive teacher must cope. Kimmel says, "New-type tests 
furnish an essential part of the procedure in the develop- 
ment of a program of remedial instruction in cases where 
pupils have failed to gain the reading ies IN a 
type reading tests have proved a wonderful source of help 
in aiding the teacher to find out within a short time the 
pupils who really need special assistance in the develop- 
ment of correct reading habits. Johnson says, "The 
ability to read eifectively is perhaps the most important 
Single factor in success in those subjects which require 
the use of books. A constructive program of supervision 
Should include a test of silent eb Pe ty Failure by the 
teacher to discover the pupils who cannot read properly 
Will many times resuit in the pupil's complete retardation. 


This situation is inexcusable; after all, there is a 


teacher responsibility wrapped up in every pupil failure. 


— oe Se ee Se ee Se oe SP ee ee Fe ee ee ee ee ee ee eee ee ee ee ee ee a ee ee ee ee 


(37) Kimmel, William Glenn, The Management of the Reading 
Program in the Social Studies, Publications of the 
National Council for the Social Studies, pp. 24, 
McKinley Publishing Co., October, 1929. 

(38) Ibid, pp. 64. 

(39) Johnson, Franklin W., Administration and Supervision 
of the High School, pp. 389, Ginn & Co., 1925. 
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Ix. Cultivation of Intellectual Powers 

Examinations tend to build up within the individuals 
taking them certain intellectual habits that are important 
in everyday living. In an examination a pupil is thrown 
entirely on his own resources. Such a situation gives 
important training in concentration and self-reliance. 
The examinee is under the immediate necessity of mustering 
all the information he has acquired about a given topic or 
question and organizing it in his mind. Then, he must 
sift the information to obtain the particular part or parts 
that bear upon the question. All of this represents train- 
ing that should prove invaluable in the development of the 
average pupil. Lang includes the following list of mental 
powers: “Application, concentration, persistence, self- 
reliance, and eaveiawenouste' ys is evident that the 
Cultivation of these intellectual habits is an important 


phase of the school work pertaining to testing. 


(40) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 42, Houghton-Mifflin Co., 1930. 
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G. PROCEDURES FOR DRAFTING NEW-TYPE TESTS 


The combination test is given a prominent place in 
any treatise on tests and measurements because it is made 
up of both recognition and recall tests. lang says, 
"Frequently it is called a battery test, because of the 
arrangement of a number of similar testing devices in 
groups or sets for producing a united measurement 
It is clear that the combination or battery test is 
suitable for the longer and more comprehensive term and 
final examinations. A more adequate sampling of the 
instructional materials is Obtained because of this greater 
length, hence the test reliability is increased. The 
present writer proposes, in this chapter, to consider first 
the method for drawing up a combination test, and secondly, 
the principles and rules underlying the preparation of 
its different parts, i.e., the true-false section, the 
completion section, the matching section, the multiple- 
choice section, etc. It must be borne in mind that the 
various tests included in the combination test will vary 
depending upon the test-maker's reaction as to what test 
forms are appropriate. In some cases, the combination test 
will consist of true-false, completion, and matching sec- 


tions; in others, the multiple-choice or other test forms 


Will be included. The test forms included will depend on four 


(1) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 184, Houghton-Mifflin Co., 1930. 
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factors; viz. the teacher's personal preference, the nature 


of the subject-matter, the method py which the test is to 
be given, and the main function the test is expected to 
perform. 

At this point, let us consider briefly the advantages 
of the combination test. In the first place, test construct- 
ion is made easy by the use of a combination test. All 
test-makers have more or less difficulty in moulding the 
Subject-matter into certain test forms. If a variety of 
test forms are to be used, there are greater possibilities 
that if one test form does not lend itself readily, another 
will. Secondly, the combination test adds rapport to the 
examination. If a variety of test forms are used, the 
interest of the examinee is sustained. A long examination 
consisting entirely of one test type breeds monotony and 
results in a lagging of pupil interest. Thirdly, the 
combination test gives the mental abilities a wider scope 
because of the variety of test forms employed. The results 
are more reliable when various types of mental reactions 
toward the subject-matter are stimulated and then sampled. 
It is apparent from the above reasons that the combination 
test fulfills an important function in the well-rounded 
testing program. 


It is highly important that the test-maker follow some 
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general plan in the building of the examination. Just 


as a contractor is guided by the architect's plans in the 
construction of a house, so too, the test-maker must rely 
upon a fundamental plan for the building of his examination. 
Professor Ruch suggests an excellent procedure for the 
drafting of new-type examinations. The present writer 
includes it because of the attention that it attracted. 
Ruch suggests ten steps that must be carried out in the 
rearing of the new-type test. 
I. Drawing Up a Table of Specifications 

The Table of Specifications is a general guide or 
outline in the building of a test. Sucn a table insures 
that the entire subdject-matter will be covered. It prevents, 
also, the over-emphasis of minor topics which results in 
the improper balance of the sampling. This table is really 
a skeleton outline of the instructional materials. Suppose 
that a unit in Commercial Geography contains six main 
groups of ideas, and that these groups vary in their relative 
importance. In the table, each group title would be shown 
as a main heading and each would be supported by the minor 
thoughts listed below it. Wow, the test-maker would be 
under the necessity of weighing the importance of each 


group and assigning a percentage value to indicate each one's 
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(2) Ruch, G. M., The Objective or New-Type Examination, 
pp. 149, Scott, Foresman and Co., 1929. 
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relative importance. For example, suppose that the weights 


given the six groups were 10%, 10%, 15%, 30%, 20%, and 15% 
respectively. This would mean that the first group should 
contain approximately 10% of the test items; the second, 
10%; the third, 15%; etc. 

Another important point to observe is that each main 
group in the table should be given a key letter in order 
to identify it. This procedure should make for greater 
order as each test item could be numbered with the key 
letter of the group to which it pertains. 

II. Drafting the Items in Preliminary Form 

The next step consists of drafting the preliminary 
test items using the Table of Specifications as a guide. 
In completing this step, the percentages included in the 
‘table are not to be followed implicitly. To do so wonld 
visit a real hardship upon the test-maker. It is sufficient 
if test items are formulated covering each topic and sub- 
topic in turn. Time should not be taken at this point to 
produce refined test items. | 

Ruch says that the important tasks are: 


"1. Covering the field thoroughly but at the same 
time avoiding trivial points. 


2e Deciding which test form is best suited for 
handling the particular question in mind."(3) 


Ibid, pp. 153. 
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It is suggested that 35" by 5" library cards be used, 


and that the item be double-spaced to allow for correction. 
Library cards may well be used as they may be rearranged, 
discarded, shuffled, etc., without necessitating any re- 
writing of the other items. Each card should contain four 
things: the key letter, the test item, the indicated 
answer, and a temporary sequential number. 

At this point it is wise to remember that more test 
items should be written than will probably be used. Ruch 
holds that these items should aggregate 25 to 50 per cent 
more than the estimate calls for. The extra items insures 
two things, viz.: 


1. The culling out of pooriy worded items is made 
possible. 


2e A better balance of the emphasis between the main 
topics of the test may be obtained. 


III. Deciding Upon the Length of the Test 

Let us assume in the drafting of the preliminary test- 
items that-they number two-hundred fifty. It would be 
possible, then, to allow for a shrinkage of fifty items if 
Only two hundred items were necessary to exhaust the subject. 
This situation would be ideal from the standpoint of the 
test-maker because the latter would be enabled to plan two 
forms of the same test with one hundred items each. 


There are a number of reasons why two forms are more 
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desirable than a single form. In the first place, one 
form may be administered to the class and the alternate 
kept for absentees. Secondly, the alternate form can 
always be used for re-testing pupils who wish to take a 
test on a unit they have failed. Thirdly, the two forms 
may be used in rotation year after year. If the first form 
is used one year, the alternate may be used the next and 
the danger from old examination papers being handed down by 
last year's pupils will be minimized. Lastly, if a longer 
and more comprehensive test is desired, both forms can be 
administered. 
IV. Editing and Selecting the Final Items 

At this time the culling out of ambiguous and poorly- 
worded items has to be done. It has been thought best 
to allow a little time, preferably a day or two, to elapse 
after the preliminary drafts of the items. In this interval, 
the test-maker has a chance to mull the material over in 
his mind. It is possible that at the end of this period, 
the test-maker will have a greater understanding of the 
real significance of the individual test items. 

The test-maker should be thoroughly conversant with 
the principles and rules of rhetoric and grammar. It is of 
extreme importance that good sentence structure be used in 


the drafting of the individual test items. Poorly-worded 
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items lower the test validity to a marked degree. Hach 
item must be gone over with the same degree of care that 
an editor uses in conning an important manuscript. It is 
a good idea for the teacher to put himself in the pupil's 
place, and try purposely to misread the meaning of the 
test items. If any item is not perfectly clear, it either 
should be discarded or rewritten. 
Ve Rating the Items for Difficulty 

There is an advantage to rating the items in increasing 
order of difficulty. If this were done, the point to 
which a pupil progressed in his long term examination would 
indicate the pupil's degree of mastery in the subject-matter. 
At best, these ratings are not very accurate since they are 
highly subjective estimates. This rating may be done on a 
five-point scale. The rating "one" will be given to items 
sO easy that all or nearly all of the pupils may be expected 
to answer them correctly. The other numbers of the scale 
indicate ascending degrees of difficulty. A “five” rating 
designates the items which are thought to be so difficult 
that all or nearly all of the pupils will fail. Teacher 
rating of items is made more accurate if a number of 
teachers pool their ratings. 

VI. Breaking the Items Into Equivalent Forms 


It is at this point in the ovuilding process that the 
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examination begins to take form. As a result of this step, 


the test-maker will have two roughly equivalent forms of 

the examination. Up to now, it has been discovered that 
certain items are better than others for test purposes. 
Certain items have been found to be too-easy, too-difficult, 
Or vague in meaning. These unsatisfactory items must now 

- be eliminated until the numbers shown by the Table of 
Specifications are approximated. 

The next step deals with the sorting of the test items 
between the two test forms. This can be accomplished by 
taking the cards and dealing them into two piles exactly 
as playing cards would be dealt. It is intended here to 
equalize the forms through the law of chance. The net 
result will be that each form will have one-half of the 
test items pertaining to each of the six main topics 
included in the Table of Specifications. 

VII. Rearranging the Items in Order of Difficulty 

If the test items have already been rated in difficulty, 
it is a simple matter to rearrange them after the elimination 
of faulty items has been effected. 

VIII. Preparing the Instructions for the Test 

No two authors on tests and measurements agree in 

their instruction for objective tests. All concede the 


necessity for clarity, fullness, and brevity in the test 
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instructions... In describing the instructions, the use 

of the edjediives "full" and “brief" seems paradoxical. 

The instructions should be sufficiently detailed so that 
they are readily understandable, and yet brief enough so 
that there is no excess verbiage. The need for very 
complete instructions will vary depending upon the ages 

and mentality of the groups in question. After the pupils 
have been subjected to repeated contacts with objective 
tests, they will become "test wise" and brief test instruct- 
ions will be adequate. 

In writing the instructions, it is wise to frame them 
so as to meet the level of the lowest mentalities in the 
groupe The simplest synonyms for all words should be used. 
A very good way to help the pupils orient themselves to a 
test is to include fore-exercises with each section. The 
pack iuan en is a measure of achievement, not of ability to 
follow directions. It has been the writer's experience that 
good pupils sometimes fail because they do not interpret 
the test instructions correctly. This situation is especially 
noticeable in cases where the objective test involves a type 
of test new to the pupils. For example, consider the Carlson 
bookkeeping test dealing with the working sheet. In this 
test, the arithmetic has been "dehydrated". In other words, 


the pupils are required to indicate the extensions of the 
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five classes of bookkeeping items by means of check marks. 
The writer in his teaching experience has never found a 
secondary school class that has been able to interpret 

this type of test and solve it correctly, notwithstanding 
the fact that the pupils had previously received a thorough 
training in the use of the working sheet. 

The pupils should be informed whether to hurry or to 
work slowly and carefully. If tests are timed, this should 
be made known in advance. Another important point concerns 
the answering of doubtful and unknown items. Authorities 
differ as to the advisability of instructing pupils to guess. 
Dr. Ben D. Wood uses instructions against pure guessing. 
Dr. We A. McCall contends that the more guessing there is, 
the more adequate the statistical correction for guessing. 
A study of the evidence reveals that the weight of opinion 
is against guessing because of bad habit formation. 

IX. Marking the Answer Keys or Stencils 

The nature of the answer key or stencil will depend on 
two factors, i.e., the nature of the test and the number to 
be scored. It is patent that an elaborate scoring device 
will not be necessary if merely the papers of an ordinary 
Glass are to be scored. Furthermore, the function of the 
test must be taken into consideration. A test purporting 


to serve as a short check-up test will not necessitate the 
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labor involved in the making of an indestructible scoring 


stencil. To spend time in the planning of such a stencil 
would represent an unwise expenditure of the busy teacher's 
time. It is probable that the best system to follow under 
such circumstances would be to score one of the mimeographed 
test papers and lay this alongside the individual test 
papers as they are scored. Ruch issues the reminder that: 
"It is ordinarily unwise to use any plan of scoring which 
calls for actual reading of the test items and the pupils’ 
0 A ae el 

The two principal types of devices for indicating 
responses are as follows: 


1. Aligned response columns, usually vertical in 
position, ee.g., 


The Mesabi Range is located in the State Ofecccccccccccccce 
The most important city in southern California iSe.cccceeee 
2e Staggered response blanks, e-.g., 

One of the principal products of China is corn, gold, wheat, 
tea, ironwood. 
Ohio is bounded on the west by Missouri, Indiana, Iowa, 
Illinois, Michigan. 

Even a@ casual study of the above test items proves 
the economy of time in the use of aligned responses. They 
are always possible with simple recall, matching, and true- 


false tests. They are often possible with multiple-choice 


-— oe oe Se Se ee ee er ee ee Se Pe ee ee ee ee ee em ee ee ee ee ee ee ee ee ee ee ce 


(4) Ruch, G. M., The Objective or New-Type Examination, 
pp. 184, Scott, Foresman and Co., 1929. 
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tests when the method of response is by number, rather 
than by underlining. 

Professor Ruch classifies answer keys and stencils 
as follows: 


"1. Strip keys for aligned vertical columns of 
response blanks or response words in such tests 


as: 

@ Simple recall 

b Numbered multiple-choice 

e¢ Matching 

ad True-false (especially the + -, the + 0, or 


the writing of T and F, etc.) 


2e Transparent celluloid or tissue-paper stencils 
for such tests as: 


a Unnumbered staggered multiple-response 
b True-false, yes - no, same —- opposite, etc., 
When underlined 
3e Cut-out stencils for such tests as: 
a Staggered (ordinary) completion 
b Staggered computation a 
5 
4. Answer sheets for reference.” 
The strip key is a very common form stencil. It 
consists of a strip of heavy cardboard from one-half to 
one inch wide and the length of the test page. The correct 
answers are written or printed on the key in such a way that 
when the key is placed in juxtapostion with the test paper 


the answers and the test items are parallel. If the key has 


been prepared properly, the short eye span between the answer 
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f6).. Ibid, pp. 176. 
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and the item should make tor speed in correction. 


Transparent stencils may be made either of tissue 
paper or of celluloid sheets. If made from the former, 
they have only a limited period of usefulness and then 
will have to be replaced. A celluloid stencil should last 
a much longer period of time; in fact, for many years. A 
celluloid stencil marked with launderer's ink and dipped 
in white shellac is almost indestructible. In marking, 
this type of stencil is superimposed upon the actual test 
page and if the marking on the stencil fails to coincide 
with the marking on the test paper, the item is incorrect. 

The cut-out stencils serve much the same purposes as 
the transparent stencils. They are troublesome to prepare, 
although less expensive than the transparent type. Like 
the latter, they render their main service in reference to 
the correction of tests arranged in staggered response 
blanks. To make a cut-out stencil, take a sheet of thin 
Gardboard, a piece of carbon paper, and a mimeographed test 
paper. Superimpose the test paper on the cardboard with 
the carbon paper in between. Draw rectangles around each 
answer blank large enough to include the pupil's answer. 
Now take the cardboard sheet and cut out the rectangles 
that have been traced upon it. Below the opening of each 


rectangle, write the correct answer to that particular 
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block. It can readily be seen that when this stencil is 


superimposed upon the individual test papers, it will be 
an easy matter to check errors. 

The use of an answer sheet in scoring papers has already 
been explained. It consists merely of a mimeographed test 
with the correct answers written in. This is the only type 
of key used by many teachers. 

X. Deciding Upon Rules for Scoring 

There are a few general truths that should be emphasized 
at this time about scoring. Most experts hold that partial 
credits should not be given. Questions should be worded in 
such a way that they can be marked either completely right 
Or completely wrong. In completion tests one point credit 
should be given for each blank which is correctly filled. 
Matching tests give one point credit for each pair Reapety 
matched. Some teachers make the mistake of attempting to 
weigh test items for difficulty or relative importance. 

This is not considered a commendable practice as it is based 
On mere teacher opinion which is a highly subjective factor. 

Any section upon rules for scoring would not be com- 
plete if certain controversial issues involving correction 
for chance effects were not mentioned at this time. In two 
response tests (including true-false, yes - no, same - 


opposite, etc.) and in multiple-choice tests, the scores 
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should be corrected for chance. In two response tests, 


including the true-false type, the correction formula is; 


SesR-W 
S = the corrected score 
R = the number right 
Vi = the number wrong 


Sometimes this same formula is expressed in the following 
Way: score = number attempted - two times the number wrong. 
By using either formula, the same corrected score should be 
Obtained in reference to any specific test paper. In 
correcting multiple-choice tests for chance effects the 


generally accepted procedure is: 


We 
SiR nal 
S = the corrected score 
R = the number right 
W = the number wrong 
n = the number of possible responses presented to the pupil. 


It is reasonable to state that the guessing factor becomes 

of decreased importance with the increase in the number of 
possible responses. Lang says, "It has been found, however, 
that if the number of the options are four or more the 
guessing factor becomes sufficiently reduced to make the 
correction for guessing See eahaerel Pt ce Studying opinions 
from the leading authorities, it may be concluded that 


correction for chance need not be made in the case of the 


multiple-choice test if the number of suggested responses 


(5) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 133, Houghton-Mifflin Co., 1930. 
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number four or five. e 


Methods for Drafting Parts of the Combination Test 
So far we have considered in this chapter the combin- 

ation test. As we have previously seen, this type of test 
consists of both recognition and recall items. Recognition 
tests consist of three main types, i.e., true-false, 
| multiple-choice, and matching. Recall tests include two 
well-known types, i.e., Single-answer and completion. It 
has been held that chance effects are a much more serious 
problem in the case of recognition tests than with recall 


A 


tests. In reference to the completion test, Lang says, 


"Item for item, it is probably the most reliable of all the 
new-type Sddtheebaneihon good combination test will 
include both recognition and recall items. 
I. The True-¥alse Test 

True-false testing appears to have given the original 
impetus to the new-type testing movement; hence, when 
Objective testing is mentioned, some teachers think of true- 
false tests only. The ability to discriminate between 
truth and falsity is a valid measure of the pupil's accomp- 
lishment in the subject-matter covered by the test. It is 


unfortunate that so much emphasis is placed upon this single 


type of test to the exclusion of other more-valid types. 


(6) Ibid, pp. 107. 
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Some authorities feel that the true-false test does not 
merit such emphasis when other more-valid test forms are 
available. Lang concludes that: “Its extensive but not 
exclusive use is SEAS Luis (et 
If the disadvantages of the true-false test are borne 
in mind, this test is a valuable aid in assisting the 
teacher to arrive at safe conclusions as to the accomplish- 
ments of entire classes or individual pupils. No single 
type of test is efficacious in bringing out the exact degree 
of progress that pupils are making. The status of a pupil 
or an entire class is arrived at by an averaging of all the 
results obtained from the well-rounded testing program. It 
is no mere accident, however, that the true-false test has 
been used to such an extent. by the ordinary classroom teacher. 
The following advantages are cited in its defense: 
1. It is the simplest and most adaptable of all the 
new-test items because of the ease with which it 


can be prepared and scored. 


2e It stimulates desirable mental processes and 
attitudes. 


3. It can be given satisfactorily by the dictation 
method if duplicating facilities are not available. 


4. It ranks high in degree of rapport from the stand- 
point of pupils taking the test. 


5. It is possible to cover a wide range of subject- 
matter in a brief time. 


(7) Ibid, pp. 103. 
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The two criticisms usually levied against this test 
are the following: 
1. It offers a golden opportunity on the part of 
unprepared pupils for guessing the correct 
responses. 


2. The false statements that it contains give the 
wrong impressions. 


Let us consider these two criticisms in order to find out 
how valid they are. In considering the guessing factor, 
some authorities differentiate between "pure" and "brilliant" 
guessing. The former consists of making a chance decision 
when there is no basis of knowledge; in the latter, there 
is a definite basis of fact. While the former is to be 
deprecated, it is generally agreed that the latter is not 
undesirable as it parallels a life situation. Countless 
situations come up where decisions have to be made on the 
basis of the knowledge already acquired and the person who 
has the most adequate store of facts is the one who can 
guess best. The guessing factor does not cause serious 
concern if the test scores are corrected for chance effects. 
Lang says, “If the true-false test is sufficiently long and 
properly prepared, and if the score is 7) ake for guess- 
ing, the guessing factor is not serious." 

The second criticism really arises because of a failure 


to distinguish between testing and teaching. Persons who 


(8) Ibid, pp. 111. 
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raise this objection point to the negative-suggestion 


effects of true-false questions. They maintain that the 
false statements in the true-false test tend to leave the 
pupils with false impressions as to the subject-matter. 
Now, the crux of the matter is this: if the test follows 
the teaching instruction, the pupils already should be 
conversant with the subject-matter from all angies. Right 
impressions have been formed if the teaching has been 
eifective, and the test is merely a measure as to how well 
the learning has been done. 

As to the construction of the true-false test, any 
test-maker will find it profitable to examine C. C. 
Be Ses entitled "How to Construct the True-False 
ein tionke' This monograph is one of the most comprehensive 
studies that has been completed within the last decade and 
should yield a wealth of information. The construction of 
a good true-false test cannot be accomplished by taking a 
number of statements from the textbook at random and turning 
half of them into false statements. Ambiguities and partly- 
true partly-false items can be avoided only if special care 
is taken in the planning and actual drafting of this type 
of test. 

Carlson suggests the following rules in true-false 
test preparation: 
(9) Weidemann, C. C., How to Construct the True-False 


Examination, Teachers College, Columbia University, 
1926. 118 pages. 
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"). Prepare a long list of true statements covering 
the suvdjecit-matter to be tested. Then proceed 
to change all of these sentences into good true 
or false items. 


2. Make approximately half of the items true. 


3. Hach test should contain at least 100 items in 
order to yield reliable and discriminating scores. 


4. Hach true-false item should be as short as possible. 
The standard length is from 10 to 20 words. 


5. Avoid the use of negatives especially double 
negatives. 

6. Avoid the use of dependent or modifying clauses. 
Most of such statements may be separated into two 
good true-false items. 


7. Avoid the use of items which are partly true and 
partly false. 


8. Watch "specific determiners" such as "all", 
"always", "never", and "degree or comparison 
statements". 

9. Duplicate, mimeograph, or print the test. 


10. Make adequate provision for indicating the 
responses."(10 


II. The Multiple-Choice Test 
This type of test is intended to measure kind and 
quality of reasoning. A true statement is made and after 
it is given a number of choices as to why it is true. The 
examinee is required to study the possible responses and to 
select the one that represents the best answer. Great care 


must be taken in the preparation of this test to insure 
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(10) Carlson, Paul A., The Measurement of Business 
Education, pp. 18, South-Western Publishing Co., 1932. 
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that the possible responses, or "confusions", are of equal 


plausibility. The guessing factor may be minimized if the 
following precautions are observed: 


1. The confusions in each test item should be listed 
in chance order. 


2. The correct answer and the coniusions should be 
equally attractive and familiar. 


3- Provision should be made for correcting the test 
scores for chance effects. 


From the standpoint of a practical commercial teacher, 
the one serious defect in the use of this test is that with 
some types of subject-matter it is difficult, almost imposs- 
ible, to get four satisfactory responses. 

III. The Matching Test 

Matching tests may be used to measure either factual 
memory or judgment. In this test two sets of items or 
expressions are given and the examinee is required to match 
an item in one set with its correct answer in the other. 

The weight of authority is toward limiting the size of the 
test to between ten and twenty pairs of items. It is desir- 
able to have the list of items in at least one of the two 
cOlumns arranged in alphabetical order. If this is done, 
the pupil is more readily able to find the items. 

Care should be taken in the planning of this test to 
make one of the columns longer than the other in order to 


prevent perfect matching. This can de accomplished by 
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adding three or four plausible items to the second column. 


Odell says, "Matching tests fit almost all subjects, al- 
though not all portions thereof; and where they are appro- 
priate, constitute one of the few best types to édgtie ee 
IV. The Completion Test 

Completion tests are included in the category of recall 
tests. One of the main advantages connected with them is 
that they are very free from guessing and chance effects. 
In the ordinary type of completion tests, blanks may be left 
almost any place in the sentence. Carlson says, "Modern 
usage favors the single blank space at the end or toward 
the end of each incomplete ee ie haa evans care 
must be taken in the drafting of the test items in order to 
insure that key words are the ones omitted. As has been 
previously emphasized, the completion test rates highest in 


regard to reliability among the new-type tests that have 


been devised. 


(11) Odell, C. W., Baucational Measurement in High School, 
pp. 492, The Century Co., 1930. 


(12) Carlson, Paul A., The Measurement of Business 
Education, pp. 20, South-Western Publishing Co., 1932. 
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He SUMMARY OF EXISTING PUBLISHED TESTS AND EXAMINATIONS 
IN THE SOCIAL-BUSINUSS STUDIES 

Any writer dealing with the social-business subjects 
soon gets the impression that here is a fertile field for 
study and research. This impression becomes especially 
pointed when he comes to the phases dealing with tests and 
measurements. ‘There is a crying need for competent workers 
in the test field. More and more, tests are being published 
that deal with social-business studies; even as yet, 
however, the published test materials are inadequate. Dr. 
Tonne says, "The printed new-type test is apparently the 
least used form of test in the social-business subjects. 
The reason for this is obvious. As far as can de ascertained 
there are no printed new-type tests available in banking, 
history of commerce, advertising, salesmanship, business 
Organization, and business TE the test-makers do 
not supply published tests in these subjects, the conclusion 
is inescapable that the ordinary teacher must avail himself 
of the new-type test techniques and prepare his own tests. 

Some of the best test materials that have been published 
deal only with the subject-matter of the particular book 
that they are supposed to accompany. This type of test is 


termed "a published text-book test". ‘The bad feature about 


(1) Tonne, Herbert A., and Tonne, M. Henriette, Social 
Business Education in the Secondary Schools, pp. 28, 
New York University Book Store, 1932. 
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such a test is that its content does not necessarily cover 
the entire field of subject-matter if it is based on merely 
one text-book. A partial answer to this odjection is that 
most recognized text-books in the social-business studies 
do parallel pretty well the present knowledge about the 
sudject-matter with which they deal. One of the chief 
deterrents in working out tests in the social-business 
studies is that educational experts do not agree as to what 
materials should be included in a specific subject. 

The test material that has been published in the social- 
business studies has been favorably received and widely 
acclaimed. The gloomy picture painted in the above para- 
graphs should not be a source of consternation to the 
progressive teacher. Even if we must grant that there is 
@ dearth of published tests, each year brings forward new 
Suggestions and teacher-built tests that merit more than 
local importance. At this time, a Summarization will be 
made of the pudlished test. 

I. Bookkeeping 

There is a question as to whether bookkeeping should 
be included in the social-business group. The answer to 
this depends upon the aims and objectives of the specific 
course. If the teaching emphases is upon bookkeeping as 


the study of business rather than bookkeeping merely as the 
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training of vookkeepers, the subject comes under the fold 


of the social-business studies. The present writer favors 


the teaching of bookkeeping as an introduction to business 


and consequently, would include bookkeeping in with the 


social-business studies. 


The most important published bookkeeping tests in the 


field of secondary education follow: 


l. 


Bookkeeping and Accounting Tests, Paul A. Carlson, 
South-Yiestern Publishing Co., Cincinnati, Ohio. 
Series A - 9 unit tests for the 15th edition of 
the "20th Century Bookkeeping and Accounting". 
Series C - 6 unit tests for McKinseys's "Bookkeep- 
ing and Accounting". 

Series D - 12 unit tests for the 16th edition of 
the "20th Century Bookkeeping and Accounting". 
These tests furnished without charge in any 
quantity needed to schools using the books with 
which they correlate. Series A, C, and D are one 
cent a test for other schools. These tests com- 
prise some of the best-known published tests in 
the entire field of Commercial Education. The 
author states that his tests have a high degree 
of validity but he has not indicated why he is 
justified in making this statement. Printed 
norms are supplied with all the tests. The 
published coefficients of reliability for Series 
D are the following: Test 1, .895; Test 2, .937; 
Test 3, .939; Test 4, .918; Test 5, .8963 and 
Test 6, -880. Professor Carlson is considered 
one of the pioneers in this field. 


Bookkeeping and Achievement Tests, Charles E. 
Bowman, American Book Co., New York, New York. 

Six unit tests based upon Bowman and Percy's. 
"Principles of Bookkeeping and Business (Elemen- 
tary Course)". Cost: One form or set, con- 
sisting of 15 copies of same test unit, 20 cents; 

6 sets, covering the 6 test units and assembled 

in one package, 96 cents. Manual and key, 12 cents. 
Achievement tests covering the "Advanced Course" 
have been prepared by Professor Atlee L. Percy 


7. 


and published by the American Book Co. Six 
unit tests. While these tests have been 
carefully prepared, there is no available 
information as to their validity and relia- 
bility. 


Bookkeeping Tests, J. Hugh Jackson, Thomas H. 
Sanders, and Alexander H. Sproul, Ginn and Co., 
Boston, Massachusetts. 

Eight series of tests (4 tests in each series). 
Series I-IV for first year course; Series V-VIII 
for second-year course. To be used with Jackson, 
Sanders, and Sproul's “Bookkeeping and Business 
Knowledge". Cost: Per series, full packege (30 
copies of each of 6 tests), $2.40; per series 
half package (15 copies of each of 6 tests), $1.20. 
These tests have been carefully prepared, yet 
there is no available information as to their 
Validity and reliability. 


Bookkeeping Tests, Fayette H. Hlwell and James 

B. Toner, Ginn and Co., Boston, Massachusetts. 
Eight series of tests (4 tests in each series), 

to be used in connection with Elwell and Toner's 
"Bookkeeping and Accounting". Cost: Per series, 
full package (30 copies of each of 4 tests), $1.60. 
Per series, half package (15 copies of each of 

4 tests), $80. No information relative to the 
Validity and reliability of these tests is 
available. 


Elwell-Fowlkes Bookkeeping Test, F. H. Elwell and 


Je C. Fowlkes, World Book Co., Yonkers, New York. 


Two parts, one for the end of the first semester 
and one for the end of the second semester. Two 
forms. Cost: 25 for $1.30, with manual of 
directions, key and class record; specimen set, 
$.35. There is no available information as to the 
validity and reliability of these tests. 


Bookkeeping Tests, Fayette H. Hlwell, Ginn and Co., 
Boston, Massachusetts. 

Seven series of tests (4 tests in each series), 

to be used in connection with "Bookkeeping for 
Today". No information relative to the validity 
and reliability of these tests is available. 


M. Be P. Objective Tests (Series A), Nathaniel 
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Atholtz and Louis Broverman, Lyons and Carnahan, 
Chicago, Illinois. 

Six parts covering important units of subject- 
matter in Atholtz and Klein's "Modern Bookkeeping 
Practice". (First Year Course) Cost: They may 
be secured without cost other than postage for 
Shipping where the Modern Bookkeeping Practice 
Test is used in class work. There is no available 
information as to the validity and reliability of 
these tests. 


Rational Objective Tests in Bookkeeping and 
Accounting (Series A), Clyde Insley Blanchard, 
Gregg Publishing Co., New York, New York. 

Ten unit tests on the bookkeeping usually taught 
the first year. Cost: 5 of any one test for 

10 cents. Teacher's edition, including one set 
of 10 tests, manual. of instructions, and one set 
of keys, 25 cents net. There is no available 
information about the validity and reliability 
of these tests. 


Rational Objective Vests in Bookkeeping and 
Accounting (Series B), D. T. Deal, Gregg 
Publishing Co., New York, New York. 

Teacher unit tests each on one of the chapters 
of Belding and Greene's "Rational Bookkeeping 
and Accounting". Cost: 5 of any one test for 
10 cents. Teacher's edition, including one set 
of twelve tests, answers, and manual of instruc- 
tions, 25 cents net. There is no available 
information in regard to the validity and relia- 


bility of these tests. 


II. Commercial Law 


While it is true that the field of bookkeeping has 


been covered fairly well by published test materials, the 


Other social-business subjects do not enjoy a similar 


advantage. 


The subject of commercial law has been covered 


by published test materials much more fully than some of the 


Other social-business subjects. Certain of the published 


Objective tests are considered excellent from the standpoint 
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of test construction methods, viz., “Commercial Law 


Achievement Tests" by Peters, Greiner, and Green. The 


following published tests are considered the leaders in 


this field: 
ea 


Commercial Law Achievement Tests, P. B. S. Peters, 
Lloyd &. Greiner, and Fred H. Green, South-Western 
Publishing Co., Cincinnati, Ohio. 

Ten unit tests based on Peters and Pomeroy's 
"Commercial Law". Cost: Set of 10 tests, 24 
cents. These tests have gone through many thorough 
steps of validation. It is safe to say that they 
are among the best published tests in Commercial 
Law at the present time. The published reliability 
eoefficients are: .895, .874, .893, .888, .918, 
0854, .822, .877, .876, and .850. 


Case Proodlems and Tests in Business Law, Frederick 
K. Bentel and Carmen G. Ridiker, Ginn and Co., 
Boston, Massachusetts, 108 pages. 

Book consists of 11 sections based on major div- 
isions of subjects. Each section is subdivided, 
providing five types of exercises; True-false, 
selection, and completion tests, which measure the 
degree to which the student has grasped the prin- 
Giples of law; case problems and business judgment 
tests, which measure his ability to apply the 
principles to business. Correlates with Huffcut's 
"Elements of Business Law". Cost: 52 cents. No 


“information in regard to the validity and relia- 


ies) 
e 


bility of these tests is available. 


Questions and Cases in Business Law, Clyde O. 
Thompson, American Book Co., New York, New York. 
Sixty-nine test units based upon the usual subject 
divisions of commercial law. Cost: One set, 
bound in pad form, 40 cents. Manual and key, 36 
cents. No information relative to the validity 
and reliability of these tests is available. 


New Burgess' Commercial Law Diagnostic Tests, J. 
H. Cox, Lyons and Carnahan, Chicago, Illinois, 
110 pages. 

This book is a group of objective tests. Design- 
ed to diagnose and test the needs of commercial 
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law students. Hach is a time test. Grading keys 
are available. Cost: 30 cents net. No 
information relative to the validity and relia- 
bility of these tests is available. 


5. Rational Objective ests in Commercial Law, Gregg 
Publishing Co., New York, New York. 
Four final tests for use with any text. Cost: 5 
of any one test for 10 cents; Specimen set, in- 
Gluding the four tests, teacher's key, and 
instructions, 10 cents. There is no available 
information about the validity and reliability of 
these tests. 


6. Commercial Law, Hariow Publishing Co., Oklahoma 
City, Oklahoma. 
Test I. Law in General, Property and Contracts. 
Test II. Negotiable Instruments, Guaranty and 
Suretyship, Sales, Personal Property and Bailment. 
Test III. Agency, Partnership, Corporations, 
Insurance and Real Property. 
| One form. Cost: Single Tests, 10 cents; 25 tests 
) and key, 75 cents; 100 tests and two keys, $2.50. 
No information in regard to the validity and 
reliability of these tests is available. 


III. Junior Business Training 
A number of good published tests are available in this 
Subject. Most of the leading textbooks now have objective 
tests that accompany them. The important published tests 
are the following: 


1. General Business Training Achievement Tests, H. 
H. Crabbe and Clay D. Slinker, South-Western 
Publishing Co., Cincinnati, Ohio. 

Right unit tests based on Crabbe and Slinker's 
“General Business Training”. Sold in units of 
four tests each, one unit including Tests 1-4, 
and the other unit Tests 5-8. Cost per unit, 
4 cents. While these tests have been prepared 
with considerable care, there is no available 
information relative to their validity and 
reliability. 


Eo 


2. General Business Training Tests, J. Raymond Smith, 
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6. 


South-Western Publishing Co., Cincinnati, Ohio. 
Eight unit tests based on Crabbe and Slinker's 
"General Business Training”. Two forms. Cost: 
16 cents per set. No information relative to 
the validity and reliability of these tests is 
available. 


Junior Business Training Achievement Tests, 
Frederick G. Nichols, American Book Co., New 
York, New York. 

Bight test units based upon Nichols' "New Junior 
Business Training". (Part 1) Cost: One form or 
set, consisting of 15 copies of the same test 
unit, 12 cents; 8 sets, covering the 8 test units 
and assembled in one package, 88 cents. While 
these tests are considered very important in the 
subject of Junior Business raining, there is no 
available information relative to their validity 
and reliability. 


Objective ests in Business Science (Series A), 
Lloyd L. Jones, Gregg Publishing Co., New York, 
New York. 

Twenty-seven unit tests, 4 semi-final and 2 final 
tests, based on the contents of "General Business 
Science" and the accompanying student's work- 
books "Projects in Business Science". Cost: Unit 
Tests, 5 of any one test, 5 cents; Semi-final 
Tests, 5 of any one test, 10 cents; Final Tests, 

5 of any one test, 10 cents; Specimen set, includ- 
ing one copy of each test (33 in all), one set of 
instructions, and one set of keys, 50 cents. No 
information relative to the validity and reliabil- 
ity of these tests is available. 


Objective Tests in Elements of Business Training, 
John M. Brewer, Floyd Hurlbut, and Juvenilia 
Caseman, Ginn and Co., Boston, Massachusetts. 

Four tests, Series I-IV, for revised edition of 
“Elements of Business Training". Cost: 25 copies 
of any one series, 48 cents. While these tests 
have been prepared carefully, no information exists 
relative to their validity and reliability. 


Zu Tavern's Business Training, Cass, a textbook 
test, Commercial Text Book Co., South Pasadena, 
California. 

No information relative to the validity and 
reliability of these tests is available. 
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IV. Commercial Geography 


The field of commercial or economic geography still 
offers a rich opportunity for test-makers. Several tests 
of a very excellent grade have been published; especially 
noteworthy are the tests py Dr. H. O. Lathrop. There is no 
Single set of tests that cover all phases of the subject so 
far as the present writer has been able to ascertain. Dr. 
Lathrop's tests accompany the Ray H. Whitbeck textbook 
"Industrial Geography". An examination reveals that they 
are based mainly on the first section of the textbook which 
deals with the geography of the United States. 

A summary of published tests includes the following: 


1. Tests in Lathrop's Laboratory Manual in Industri- 
al Geography, H. 0. Lathrop, American Book Co, 
Sixteen unit tests, divided off into "A" and "B" 
levels, based on "Industrial Geography” by Ray 
Hughes Whitbeck. The best published text-book 
test that exists in Commercial Geography today. 
There is no available information relative to the 
validity and reliability of these tests. 


2e Tyrrell's Geography Exercises, James F. Tyrrell, 
The Palmer Co., Boston, Massachusetts. 
Pifteen completion tests ranging in length from 
40 to 60 items each on the geography of the world. 
More suitable for junior high than senior high 
use. Cost: Complete specimen set, 20 cents; in 
quantities for class use, 1 cent per test. No 
information about the validity and reliability of 
these tests is obtainable. 


Se Industrial and Commercial Geography Tests, John 
W. Morris, Harlow Publishing Co., Oklahoma City, 
Oklahoma. 

Three tests. Numbers one and two contain three 
parts each; number three, five parts. 

Test I. United States: Location, Production, 
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Industry and Commerce. 

Test II. Latin America, Europe, Asia, Africa, 
and Australia. 

Test III. Map Locations 

This test series is especially adaptable to 
high school use. It was prepared under the 
auspices of the Oklahoma Council of Geography 
Teachers, 106 pages. There is no available 
information about the validity and reliability 
of these tests. 


4. Geography Trade Problems and Practice Tests, 
Lylyan H. Block, A. J. Nystrom and Co., Chicago, 
Illinois. 

Tests based on Nystrom International rade Desk 
Maps; 18 test series, one for each of the impor- 
tant products that figure in international trade. 
Well-planned series intended primarily for high 
school use. There is no available information 
relative to the validity and reliability of 

these tests. 


5. Witham's Standard Commercial Geography Tests, 
Je L. Hammett Co., Cambridge, Massachusetts. 
A series test intended for high school use. 
There is no available information in regard to 
the validity and reliability of these tests. 
Ve. Economics 
The paucity of good published test material in the sub- 
ject of economics is especially lamentable. Professor Inglis 
states that He subject was taught.in Massachusetts as far 
2 
back as 1821. Considering the long period of time over which 
this subject has been taught, it is amazing that greater 
strides have not been taken in the development of published 
objective tests, The one ray of light in a gray sky is 
afforded by the American Council Economics Test, details for 


which are given below. 


1. American Council Economics Test, Horace Taylor, 
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(2) Inglis, 4., Principles of Secondary Education, Houghton- 
Mifflin Co., Boston, 1918, pp. 187. 
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T. N. Barrows, and Ben D. Wood, World Book Co., 
Yonkers, New York. 
Two forms. Cost: Package of 25 or either forn, 
with manual of directions, key and class record, 
$1.30 net. Specimen set, 20 cents. Tentative 
norms have been established for this test. No 
information is available relative to the validity 
and reliability of these tests. 
VI. Business English, Business Organization, Salesmanship 
and Advertising, History of Commerce, and Banking 
So far as the present writer has been able to discover, 
there are no published tests in the above-listed subjects. 
This is due in some cases, at least in part, to the nature 
of the subject; in others, to the unimportant position that 
the subject holds in reference to other subjects on the 
curriculum. Certain published tests do cover part of the 
subject-matter of one or another of the above subjects. Con- 
Sider, for example, the subject of business English. There 
are two published tests that are of value in this subject. 
The first of theseis, Leslie Clark's “Letter Writing Test" 
and the second, D. D. Lessenberry's “Tests on the Parts of 
the Business Letter" published by the L. C. Smith and Corona 
Typewriters Company, Inc. In reference to the subdject of 
business organization, H. G. Shields’ "An Experimental 
Business Backgrounds Test" has much to offer the progressive 


teacher in the way of suggestions for helping him prepare 


his own informal objective tests in the subject. 
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I. METHODS OF CHANGING THST SCORES INTO GRADES 
I. The Marking System 

There is a tacit agreement existing among most teachers 
that school marks and a definite marking system are a good 
thing. Most members of the profession accept the marking 
System of the individual school in which they are employed 
without demurring in the least. Of late years, however, it 
has been brought out quite forcefully that certain crudities 
do exist in the individual marking plans, and that some 
marking systems are distinctly superior to others. Exponents 
of “progressive education" hold that marks are not necessar- 
ily a good thing; in fact, they would favor a complete 
recasting and reorganization of the marking systems in common 
usagee Some well-known schools that have been organized as 
progressive schools even have gone to the point of eliminating 
the five point scale of marks (A, B, C, D, and X) and of 
Substituting merely two grades, viz., pass or fail. 

At the outset, the present writer wishes to emphasize 
that he does not favor the abolition of marks, but, rather, 
favors a definite marking system adhered to by all the 
members of the particular school. It is wis contention 
that school marks have decided motivating values that 
Cannot be ignored. While it is recognized that an extrin- 


Sic motive such as pupils working for marks is not 
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necessarily a good thing if carried to extremes, yet, 


the mere knowledge that a scale of marks is to be used 
will have the effect of stimulating many pupils to do 
their utmost. Odell, in common with many other author- 
ities, holds that school marks should be Sihsusais* tae 
must be remembered by teachers that school marks are not 
at all times a complete motivation for a new unit. No 
effective teacher deviates very far from the principle 
of trying to motivate his subject by developing an intrinsic 
interest in the subject-matter itself. The motivating 
Values is only one of the purposes for marking. Crooks 
says, "A brief yet inclusive summary might mention pupil 
and parental information, guidance, classification and 
certification, motivation, and measurement of educational 
efficiency as the primary aims of all marking speasnaene 
No marking system can be entirely adequate until it 
has been carefully defined. A particular marking system 
in use in a given school should be thoroughly understood 
not only by the teaching staff but also by the pupils and 
their parents. In many cases, the teacher assumes that 
parents understand the marking system when, in reality, 
they are entirely oblivious of even its fundamental 


elements. This situation compels the teacher to familiar- 


ize himself with the school marking system so that he can 


(1) Odell, C. W., Educational Measurement in High School, 
pp. 459, The Century Co., 1930. 


(2) Crooks, A. Duryea, Marks and the Marking System: A 
Digest, The Journal of Education Research, Dec., 1935. 
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explain the logic of it to parents who make inquiry. Ruch 


says, "After all, any marking system is arbitrary. Without 
definition it can have no Bas edenrabee members of the 
teaching staff are in doubt as to certain aspects of the 
marking system, it is compulsory that they consult the 
school executives in order to clear up the situation. If 
a@ new marking system is being planned, it is always good 
policy to allow the teaching staff to participate in the 
actual work. The teachers would then have the opportunity 
to challenge anything that seemed illogical. Ruch says, 
"Almost any scheme of recording marks, provided it be 
adhered to by all teachers in the same school or school 
system, and provided further that it be understood by all 
concerned, will prove adequate if there is a valid and 
reliable provision for the measurement of the relative 
abilities of the pupils to be syaasee 
II. Absolute versus Relative Marking Systems 

It is undoubtedly true that there are many hundreds 
of different marking systems in use in the United States 
if we include all the minor variations of the common 
types. All marking systems reduce fundamentally to two 
types, viz., systems based upon absolute standards, and 
those based upon relative values. Odell concludes that 


(5) 
the former are much more common. 
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(3) Ruch, G. M., The Objective or New-Type Examination, 
pp. 376, Scott, Foresman and Co., 1929. 
(4) Ibid, pp. 377. 


(5) Odell, C. W., High School Marking Systems, School Re- 
View, Vol. XXXIII, 1925, pp. 346. 
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The first marking system consists of the percentage 


grading plan. This plan presupposes the grading of papers 
on a scale ranging of O to 100, in all, 101 levels of 

pupil accomplishment. The chief advantage accruing from 
the use of such a plan is that teachers are familiar with 
it. Ruch points out that: "The greatest weakness of this 
system is that its sheer familiarity leads to an uncritical 
acceptance without conscious regard to the inherent assump- 
eS ae uncritical acceptance of the percentaze system 


implies one of two things that follow: 


1. That the examiner thinks he can distinguish 
between 101 levels of pupil’ accomplishment, or; 


2e that either due to ignorance or inertia he does 
not care whether his marks are misleading because 
they indicate an apparent accuracy which really 
does not exist. . 

Absolute systems of marking assume absolute standards 
of judgment. They presume to attempt the precision of 
physical measurements with their very definite units such 
as hours, pounds, etc. Marking systems have not been worked 
Out to the extent that fine distinctions may be drawn 
between the relative accomplishments of two pupils who for 
all practical purposes have attained the same degree of 
mastery in a specific subject. For example, suppose that 


the percentage system is utilized in a given school and 


that one pupil is graded 86 and another 85. It appears on 


= oe me oe ee ee ee ee ee eee ee ee ee ee ee eee eee eee eee ee ee ee ee ee oe 


(6) Ruch, G. M., The Objective or New-Type Examination, 
pp. 370, Scott, Foresman and Co., 1929. 
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the face of things that the first pupil is superior to 
the second. The one basic criticism to the above situa- 
tion is according to Ruch: "A large body of experimental 
evidence points to the fact that from but five to seven 
levels of ability are ordinarily recognizable by teachers 
in marking pupilSeccccccccsccccccccee The difference 
between an 85 and an 86 is a difference at least five times 
as fine as the human judgment can ordinarily dai tao tai 

Another potent criticism levied against the percentage 
System is in reference to failures. If the percentage 
system is in use, there is one percentage point, known as 
the "passing" mark, which discriminates between passing and 
failing. If the passing mark is 70%, a pupil: receiving a 
grade of 69% does not pass and consequently, must take the 
course over again. It is no defense to say that the line 
must be drawn somewhere. Some educators refuse to pass 
pupils unless they have mastered the subject-matter completely. 
Carlson states that: "It is absurd to defend a fixed passing 
mark which permits some poor pupils to carry on and forces 
others who are but infinitesmally poorer to repeat the 
entire course from the Sane 

The second general type of marking system employs the 


idea of the normal curve of distribution. Professor Max 


(7) Ibid, pp. 373. 


(8) Carlson, Paul A., The Measurement of Busines 
Education, pp. 25, South-Western Publishing Co., 1932. 
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Meyer of the University of Missouri was the originator 
of this method. According to his plan, pupils were to 
be marked on a basis of relative, not avsolute achieve- 
ment. Chance or probability is the underlying logic. 
It has been noted that chance phenomena tend to form a 
Symmetrical bell-shaped curve. In biology, it is known 
as the biological curve. In statistics it is known as 
the normal frequency curve. When applied to marking 
systems, it is known as the "Missouri Plan". 

The normal distribution curve is a useful guide in 
the changing of test scores into grades. Most human 
traits, including mental ability and school achievement, 
seem to approximate the normal distribution curve. There 
is an accumulation of cases about average ability. This 
accumulation clusters around the middle part of the curve. 
From this point the curve slopes off toward the upper and 
lower extremities, indicating a diminishing number of 
persons as the distance from the average increases. If 
the test grades of a large number of pupils are taken, 
the approximate distribution as shown by the normal distri- 
bution curve should de expected. 

Based upon the normal curve, it is possible to compute 
the number of pupils who will fall into a particular grade 


group. Mathematicians have computed in reference to a 


on > . , ~ oo 

eet. to snlanedo oar | 
RS ty 
ih by Lot etter” 


4 


txongas. of) moon 


. A . : 
‘ ‘LATS aa} =— ¢ : ais. wa] 
; = ws GM 4, pk 
“ee ev a “ ota 
. - al - 4. * 4 . 
oi 7 ~ i had, 1-. “ - - 
~ &» be ~ + = PM hey, [v6 


: fale eat en encvereg 
| ve 
si #8 To So2L279 tee? brat 
L J 2 i b >< uv tents . Dadhik: § edt 


P » od bigods errs nohtsd 
sid cogs Bese} 
ef ome 0 amit « 


£ edt at eteasdi ait wy. 


five-point or quintile system of grading that 6% of the 
pupils should be expected to receive A, 25% should receive 
B, 38% should receive C, 25% shonld receive D, and 6% 
should receive X. In a six-point or sextile system, the 
proportions would be 3% A's, 16% B's, 31% C's, 31% D's, 
16% E's and 3% F's. In practice, it will be found that 
there are many variations of the proportions that have 
been stated. 
-III. Methods for Changing Test Scores into Grades 

A common fallacy in written examinations is the 
failure to differentiate between a test score and a grade. 
Some teachers think that the terms "Score" and "grade" 
are synonymous. It is highly important that this error be 
avoided. The score is the number of points a pupil makes 
upon his examination. For example, in a new-type examina- 
tion consisting of one-hundred fifty items, the pupil 
might get ninety-eight. His score would be ninety-eight. 
The pupils are not particularly interested in their scores. 
They ask, "To what grade is ‘ ninety-eight equal?" Lang 
explains that: "The grade is the interpretation of the 
score according to some standard or ES RE Ae grade 
is intended to show the pupil's relative achievement in 


respect to that of the rest of the group. 


(9) Lang, Albert R., Modern Methods in Written Examina- 
tions, pp. 246, Houghton-Mifflin Co., 1930. 
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a. Proportional Method 

Many teachers are using this method for the conver- 
sion of test scores into grades. According to this plan, 
the test papers are arranged according to their scores 
from the highest to the lowest. The next step consists 
of computing the number of papers that will fall into each 
grade group. If a five-point system of marking is followed 
in the individual school system, the proportions of 6-25-38- 
25-6 would be used. ‘The 6% best papers would be marked A, 
the next grade group consisting of 25% of the papers would 
merit B, the middle 38% would get C, the next grade group 
of 25% would receive D, and the remaining papers would be 
graded X or failure. 

There are a few warning notes that should be sounded 
in regard to this system of marking. First, the normal 
Gurve is only a guide, consequently, the teacher will have 
to make some subjective decisions in the conversion process. 
Suppose that a pupil is a border-line case and that 
according to his test score he might be placed in either 
the B or C groupings. The teacher must review in his own 
mind the other work of this pupil in order to decide to 
which grouping the pupil rightfully belongs. As this is 
a subjective judgment, there is no surety that an absolutely 


fair decision will be reached. Secondly, the application 
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of the proportional method to the scores of a particular 


class does not insure perfectly fair grading. Consider 
the situation when a class only consists of fifteen or 
twenty pupils. If a teacher follows the normal curve 
of distribution mechanically it is quite possible that 
Such a procedure will be unfair to the poorer members of 
the class. After all, the proportional method works best 
where large numbers of scores are being converted into 
grades. 

b. Sigma Method 

This method is one of the best procedures for changing 
test scores into grades. Lang says, "This method should be 
mastered by all who wish to perfect themselves in the 
examination pale: ae first. the statistical work 
involved will cause trouble ts teachers not grounded in 
statistics, but gradually as they learn the procedure, they 
will be impressed with its reasonableness. 

The sigma method involves the use of the standard 
deviation in educational statistics. The standard deviation, 
abbreviated S D or 6, is usually explained in reference to 
the mean. The mean is the point or value around which the 
scores tend to group themselves, whereas the standard devia- 
tion represenis the distance from the mean that scores tend 


to distribute themselves. In working out problems on the 
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(10). Ibid, pp. 254. 
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sigma method it is considered easier to calculate the 


standard deviation from a guessed mean by using the formula: 


sD 4/ Sum of a _ (sm of ug 
N ae 


At this time it would be appropriate to consider 
examples showing the computation of sigma. An explanation 
of the two common methods for figuring the 5S D will be 
given in the following pages. 

In the following example, it is assumed that the scores 
are for a class of average size, viz., 35 pupils. The test 
from which these scores came contained a total of 46 items. 
The spread in the scores, the difference between the highest 
and the lowest, is 38 points. The greatest concentration 
in the scores is around 25, consequently, this point will be 
taken as the guessed mean. The (X) column represents the 
scores ranging from the highest to the lowest; the (f) 
column, the number of times each score occurs. The amounts 
in the (a) column are computed by subracting the guessed 
mean from the score and indicating whether the difference 
is positive or negative. Amounts in the (fa) column are 
Obtained by multiplying the amounts in the two preceding 
columns together. Amounts in the (fa) column are found by 
multiplying the amounts in (f) column times the amounts in 
(a) column squared. The remainder of the problem consists 


of making the proper substitutions in the appropriate 
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formulas and then working them oui. 


Once the mean and standard deviation have been 
determined, it is comparatively easy to ahatige the scores 
into grades. The mean of 24 is taken to be the middle of 
the C group. The standard deviation of 9 is taken to be 
the number of consecutive scores to be included in each 
grade group. 

The C group will consist of the score 24 with scores 
above and below. As there are 9 scores in each grade group, 
the C group will consist of 20, 21, 22, 23, 24, 25, 26, &7, 
28. The B group will include the next nine scores above 
coe @ wrOne, Vit., 29, S30, S31, 58, 335,.34,-.55, 56, 357; the 
A group, the scores just above, viz., 38, 39, 40, 41, 42, 
45, 44, 45, 46; the D group, the scores just below the C 
Seuees Vena, 2o, 26, 17, 16, 15, 14, 13, 12, 11; and the 
(X) or failing group, the scores from 10 down to - 

Even pupils entirely unversed in statistics recognize 
the reasonableness of this method. The greatest criticism 
that the present writer has ever heard directed against it 
by teachers in service is that it takes too much time. 

Many teachers prefer the proportional method to the sigma 
method just because of this reason. 

We are now ready to consider the second problem on the 
computation of the 5 D. The method of using ungrouped scores 
has already been explained; now we shall take up the method 


involving grouped measures. 
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The explanation for this second problem is very much 
like that for the preceding one. The main difference 
between the problems is that in the first the scores are 
not grouped, but in the second they are. Grouping of scores 

must be resorted to when there are many scores to handle, 
and when the range is very great between the highest and 


lowest scores. In this problem, the range is 90 points. 


It will be noted, too, that the frequency includes 100 scores 


‘whereas the preceding problem merely contained 35. 

In grouping, it is necessary to plan for between 12 and 
20 rain es is considered best, also, to have each group 
consist of an odd number of scores so that the middle score 
can be taken as the midpoint of the group. In the problem 
under consideration, the range is 90 points. If this is 
divided by 7, the quotient is 12 with 6 as a remainder. 
The scores could very well be divided into groups of seven 
each, as such a grouping will satisfy the requirement that 
Professor Lang suggests. After the grouping, the amounts 
for the various columns are determined. Once the totals 
are obtained, it is easy to substitute in the necessary 
formulas and so arrive at the values of (M) and the (SD). 

ec. The Morrison Marking System 

Professor Morrison of the University of Chicago has 


long been dissatisfied with existing marking systems. The 


(11) Ibid, pp. 261. 
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system he suggests is based-on the idea of "pupil mastery”. 
He deplores the practice of passing pupils if they have 
evidenced a 70 per cent score on their written examinations. 
He contends that teachers are not justified in passing 
pupils who have not completely mastered the suodject-matter. 

His evaluation of the rank-in-class (relative) methods 
of marking does not reflect favorably upon them. Professor 
Morrison says, “Appraisal by rank-in-class is therefore 
badly calculated to identify and measure the real education- 
al product. Worse than that, it seems to have an essentially 
anti-educational TENG ENCYseseseeeeseseseeeesesccecscsceveres 
eee in the place of inward satisfaction in growth attained, 
of which the individual can be certain, it substitutes the 
restless ambition to surpass one's ea 

While the above viewpoint is logical if Professor 
Morrison's premises are accepted, the present writer prefers 
to accept the viewpoint of Ruch, Odell, Symonds, Lang, 
Carlson, and others. 

In the Morrison system of marking, the grade scale of 
A, B, C, D, and X would be eliminated. He suggests that an 
entirely different type of mark be given. His marks would 
not consist of merely one letter or symbol, but would 


include additional information. He says, "If we desire 


(12) Morrison, Henry C., The Practice of Teaching in the 
Secondary School, the University of Chicago Press, 
2nd Edition, 1930, pp. 72. 
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further a performance record, we may enter M110 N and 
agree that the expression means that we have evidence of 
mastery, plus a skill rated at 110 on @ standardized test, 
plus evidence of cultural iesaed 

ad. The Percentile Ranking System 

Much merit is claimed for this system by its advocates. 
Carlson calls it "the newest system of piuchduetsvaac stains 
to this method, the pupils are first listed in rank order 
and then these ranks are translated into percentiles. for 
example, suppose we have a group of 12,000 pupils taking a 
state-wide bookkeeping examination in Connecticut. The 
one per cent top papers, or 120 papers, would be given a 
rank of 99. The next 120 papers would be given a 98, and 
so on down. If a pupil received a percentile rank of 75, 


he would know that he excels 74% of the pupils and that he 


is excelled by about one-fourth of them. 


(13) Ibid, pp. 80. 


(14) Carlson, Paul A., The Measurement of Business 
Education, pp. 25, South-Western Publishing Co., 1932. 
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Je THE PROBLEM OF ABILITY GROUPING 

One of the most difficult problems a teacher faces 
is the teaching of a class composed of pupils of varying 
mental abilities. In schools where there is no proper 
Glassification of pupils, it is no uncommon experience 
for a teacher to be required to handle a class consisting 
of both dull and bright children thrown in together. The 
problem might be further aggravated by the inclusion of a 
few disciplinary or problem children, and some moral or 
emotional misfits. When the teacher is confronted with 
Such a hodge-podge, he must use individual instruction 
methods to a large degree. If the pupils in this class 
could be Separated into different classes so that each 
would contain approximately the same kind of children, 
the teaching problem would be simplified. Ability group- 
ing looks to the classification of pupils with approximately 
the same mental ability into the same class. Cubberley 
says, "What every school principal desires to give to every 
teacher is as homogeous a working group of pupils as can 
be Oat oe ce is almost an impossibility to get perfect 
homogeneous sectioning of pupils in the ordinary high school 
because of administrative difficulties. 


Individual instruction methods have proved a boon in 
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(1) Cubberley, Ellwood P., An Introduction to the Study 
. Of Education, Houghton-Mifflin Co., 1926, pp. 251. 
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the teaching of ill-assorted groups. Ability grouping 
or homogeneous sectioning really aims at the same thing 
but on a much larger scale. Brueckner and Melby say, 
"Ability grouping is really a mass production method for 
adapting instruction to individual oe 
problem of how to reach effectively the dull children and 
yet not bore the others is a vital problem in many class- 
rooms. Albderty and Thayer suggest that: "Various plans 
such as ability grouping, individualized instruction, and 
the coaching of laggards, all have their enthusiastic 
advocates, but in spite of much experimentation designed 
to demonstrate the effectiveness of particular procedures, 
we are forced to admit that the problem of how to adapt 
the schools to the individual differences of pupils is 
still a vital aa 

Homogeneous sectioning is intended to render more 
effective the instruction of bright children as well as 
of dull children. In reference to the dull groups however, 
it has been highly useful. Dr. Baker calls attention to 
the following main defects in the general program of 
education: 

"1. The lack of proper psychological methods of 

instruction for dull pupils and for bright 


pupils, too, so far as that goes. 


(2) Brueckner, Leo J. and Melby, Ernest 0., Diagnostic and 
Remedial Teaching, Houghton-Mifflin Co., 1931, pp. 27. 


(3) Alberty, H. R. and Thayer, V. 7., Supervision in the 
Secondary School, D. C. Heath and Co., 1931, pp. 291. 


=—"so° 


2. The expectation of achievement on the part of 
the dull equal to that of average pupils of 
the same age. 


3- The inadequate provision of courses of study 
Which are definitely adapted to the needs of 
dull pupils or of bright pupils. 


4. The lack of proper segregation of pupils so 
that the first three features listed above, 
may be carried out in an efficient manner." (4) 


No one of the above defects is irremediable if the proper 
approach towards its solution is observed. 

Now the question arises: "How may pupil classification 
be effected so as to yield the maximum benefit?" A study 
of the evidence shows that there is no one method - that 
the number of methods advanced varies directly with the 
number of writers in the field. All evidence points, how- 
ever, to some general principles. Hildreth warns that: 
"In progressive schools pupil classification is never 
haphazard nor ae a Glassification, in other 
words, must be based on a carefully evolved plan. Another 
important caution that must be heeded is that the segrega- 
tion of pupils into homogeneous sections must not be based 
solely on one criterion. Alberty and Thayer emphasize the 
following: "It would appear that there is no royal road 
to homogeneous sectioning. The only safe plan is to consid- 


er, not a single measure, but all the evidence which is 


(4) Baker, Harry J., Characteristic Differences in Bright 


and Dull Pupils, Public School Publishing Co., 1927, pp.30. 


(5) Hildreth, Gertrude H., Psychological Service for School 
Problems, World Book Co., 1930, pp. 183. 
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available, and to provide fully for constant shifting 


from one section to another upon the discovery of apparent 

| Pitteishaltas’ |: is not intended that the above should 
excuse even a busy school staff from adopting measures to 
bring about adequate pupil classification. An effective 
Glassification of pupils will go far toward eliminating the 
tremendous educational waste represented by pupil' failure 
and repetition of courses. 

Some schools still group pupils for instructional 

purposes according to chronological age. From the viewpoint 


of many authorities, such a practice is ifidefensible. In 


many schools, there is no clear well-formulated plan under- 


) 

| 

) 

| 

| 

| 

| lying pupil classification. Van Wagenen says, "More and 

) more the principle of classification has given way to a 

) trial and error process of grade retention, promotion, and 
| acceleration, without any well defined ieee to replace 
the one in the process of being discarded." The present 

| writer contends that a carefully developed plan of pupil 

| Glassification should be adopted by the school faculty and 
then faithfully observed. 

| | One of the best lists of criteria for classification 


) has been prepared by Professor Hildreth. It must be 


remembered, however, that she is referring mainly to the 


(6) Alberty, H. R. and Thayer, V. T., Supervision in the 
Secondary School, D. C. Heath and Co., 1931, pp. 291. 


(7) Van Wagenen, M. J., Education Diagnosis and the 
Measurement of School Achievement, pp. 119, the 
Mac Millan Co., 1926. 
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elementary school. She says, "The possible bases for 
pupil classification are as numNerous as the characteristics 
of the children to ve classified.......c-e.eeeee5 Progress- 
ive educators consider the following criteria the most 
important for grade placement, grade sectioning, and 
promotion, although opinion varies concerning the weight 
to be accorded any one factor: 
1. The pupil's prodabdie rate of mental development 
2 Level of mental maturity 


Se Predicted progress in one or more of the tool 
subjects 


4. level of achievement reached in any one or more 
of these skills 


5. Chronological age 

6- Social and emotional maturity 

7. Physiological eae 

Symonds suggests a list of bases for homogeneous 

sectioning that would apply more directly to the high school. 
He says, "There are four bases possible for homogeneous 
sectioning: 

1. Present status in the suodject 

2 Present general ability 

5S. Predicted status in the subject 


(9) 
4. Rate of learning." 


(8) Hildreth, Gertrude H., Psychological Service for School 
Problems, World Book Co., 1930, pp. 185. 


(9) Symonds, Percival M., Measurement in Secondary Education, 
pp. 485, The MacMillan Co., 1950. 
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In sectioning on the basis of present status, the scores 


on an achievement test could be used. For the second, 
scores on either one or two intelligence tests could be 
taken. For the third, a prognosis test will have to be 
used. For the fourth, the rate or speed with which progress 
will be made in the subject, a comparison of the scores on 
Successive achievement tests will have to be studied. 

Let us consider at this point the advantages claimed 
for the practice of ability grouping. Probably the most 
potent argument that can be adduced is that homogeneous 
grouping makes for more efficient pupil’ learning. Odell 
says, "Most of the arguments in favor of homogeneous group- 
ing may be united into one, that it makes for more efficient 
learning and, as a result, better achievement, on the part 
of the pupils. In most cases the basis for this argument 
is theoretical and not EES very important 
factor is that it makes the work of the teacher easier. If 
the Senahér has pupils of approximately the same abilities 
in his class, he can plan his instruction to meet the level 
of the group. With a heterogeneous class, the teacher must 
prepare to meet the levels of the different elements, and 
his effort is diffused to such an extent that the presenta- 


tion is ineffective. 


(10) Odell, C. W., Educational Measurement in High School, 
pp. 502, The Century Co., 1930. 
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Homogeneous sectioning has been attacked from a 
number of angles. Bagley has been one of its strongest 
critics, and Terman has been one of its most staunch 
defenders. Bagley's theory of educational determinism has 
been frequently cited. Symonds calls attention to it in 
these words: “Segregation on the basis of ability means 
differential educational treatment and the very act of 


placing a pupil in one or another Bagh rer eeeat for a 
11 


fatalism or determinism in his education." This theory 


holds that the placing of a pupil in a "slow" section 
brands him as a dullard and forever dooms him to the 
limited opportunities existing in his group. The force 

of the above indictment is diminished if the shifting of 
pupils from one group to another when they show improvement 
is provided for. 

Great care must be taken in sectioning to avoid the 
use of such terms as "dull group". The pupils in the 
different sections should not be made to realize that they 
have been put in the slow group because of their low 
attainment. ‘The trouble is, however, that pupils in the 
secondary school will soon come to sense what has happened 
even if they do not understand the full details. It is 
difficult, nay impossible, in most schools to make an 


administrative change and not have them get at the reason 


(11) Symonds, Percival M., Measurement in Secondary 
Education, pp. 477, the MacMillan Co., 1930. 
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sooner or later. Nevertheless, the attention of the class 
should never be focussed upon the fact that they are the 
dull group. Brewer suggests the following: "Classification 
or sectioning by ability should proceed with great caution 
and must always be tentative; furthermore, particular care 
should be taken not to use such expressions as ‘low level’, 
‘dull pupils', ‘inferior children', and ‘low group'. 
Neither should sectioning prevent the free contact of all 
kinds of children with each other, at least in student 
activities, athletics, auditorium exercises, music, and the 
like, which should be used to integrate the student body 
in preparation for later life activities, where of course 
persons of a variety of intellectual levels are ee ee 
Ability grouping has often been criticized as being 
undemocratic. it is argued that democratic conditions do 
not prevail in the classroom if any attempt has been made 
to segregate pupils according to their relative mental 
abilities. Another objection occasionally voiced is that 
the dull need the stimulus of the bright. The objection 
that gifted pupils will tend to overwork when grouped 
together is frequently cited. Besides all these, it is 
said, at times, that homogeneous sectioning is not the best 
policy because dull pupils learn a great deal from the 


recitations of the bright pupils. 


oe oe Se ee SP ee ee ee Se ee ee ee De ee Se ee Ge ee ee ee ee ee ee ee Se ee eS ee Se ee oe 


(12) Brewer, John M., Education as Guidance, The MacMillan 
Go., 1932, pp. 581. 
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K. THe GROUP INTELLIGENCE TEST AS AN AID TO THE COMMERCIAL 
THACHER 
I, Development of the Group Test 

Mental testing is of rather recent origin even though 
its origins may be traced back to a distant past. Harlier 
in this study (pages 16 to 20 inclusive), a short historical 
development of the intelligence test was presented. The 
epochal work of the French genius, Alfred Binet, gave a tre- 
mendous momentum to the intelligence-testing movement. His 
work is of great importance historically because so much of 
the later development of intelligence testing can be traced 
to ideas that he originated. Shortly after 1900, Binet and 
co-worker, T. Simon, set to work in connection with the 
public school system of Paris on the problem of picking out 
those children who were likely to fail to profit from their 
school work. As a result of their studies, the famous 
Binet-Simon scale consisting of thirty tasks or exercises 
Was published in 1905. These tasks were arranged in order 
of increasing difficulty, but were not grouped according to 
mental age. 

After further experimentation during the next few years, 
Binet revised the original scale in 1908. The important 
point about this revision to note is that the tests are now 
grouped according to their appropriate ages. In 1911, the 
year of his death, his last important article on mental 


tests appeared, and it contained a further revision of his 
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seale. This revision differed from the 1908 Scale in 
arrangement of tests and in the allotment of tests to each 
age. Some new tests were included and some of the old were 
dropped with the result that the revised scale comprised 
fifty-four tests in all. It is evident that Binet's work 
can be traced to his interest in abnormal psychology. The 
practical sociological problem of how best to help various 
defective and delinquent classes was of great interest to 
him. Pintner says, "If, in the history of psychology, we 
call Wundt the father of experimental psychology, we must 
then call Binet the father of intelligence inane 

In the beginning, the mental testing movement in America 
was based upon the beliefs of “functional psychology". The 
work of Cattell with the Columbia University freshmen in 1890 
illustrates this point. Ragsdale says, "Cattell used essent- 
ially the assumptions of the functional psychology which 
were to-be explicitly formulated only a decade later. This 
school of psychological thought believes that mind can be 
understood as being composed of a large number of functions 
Or ways of si scutes It was logical under these conditions 
for psychologists to attempt to measure each of the mental 


functions separately. When Binet began his work, he accepted 


the psychological assumptions in vogue, viz., that there 


(1) Pintner, Rudolph, Intelligence Testing: Methods and 
Results, pp. 32, Henry Holt & Co., 1931. 


(2) Ragsdale, Clarence E., Modern Psychologies and 
Education, pp. 215, The Macmillan Co., 1932. 
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were many mental functions, each of which it was desirable 


to teste In addition, however, he made an assumption which 
radically changed the character of the tests. Ragsdale 
says, “Binet assumed that mental functions of all kinds 
develope at approximately the same rate. This assumption 
of Binet's made it unnecessary to be greatly concerned about 
just which mental function was being measured by any given 
RA 

Even before the publication of the Binet-Simon Scale 
Dr. Lewis M. Terman of Stanford University had been working 
on the problem of individual differences among school 
children, and when Goddard brought out the first American 
publication of the Binet tests in 1908, Terman seems to have 
become interested in Binet's method. During the years 1910 
and 1911 Terman and Childs tentatively revised the Binet 
1908 Seale; this revision was published in 1912. Pintner 
holds that: “Terman considered this merely a tentative 
revision, because his experience with the Scale so far had 
shown him the great possibilities in the way SE further 
extension and more complete standardization." During the 
next five years Terman and his co-workers occupied them- 
selves with the revision, extension, and standardization of 
the Binet Scale, and their final results certainly justified 


this expenditure of time. 


(3) Ibid, pp. 217. 


(4) Pintner, Rudolph, Intelligence Testing: Methods and 
Results, pp. 40, Henry Holt & Co., 1931. 
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The Stanford Revision does not contribute anything that 


is essentially new to Binet's ideas. Binet originated the 
method; Terman worked to perfect it. A more complete 
standardization of the tests was effected. In his work, 
Terman found the need of a new statistical term in order to 
properly express his findings. This new term was the 
Intelligence Quotient (I.Q.) that was later to become so 
important. It is noteworthy, however, that this concept was 
not original to Terman, yet its wide use subsequently can be 
directly traced to his adoption of it. Kelley says, "Stern 
in 1912 was the first to use in print the term ‘mental 
quotient’, meaning thereby the mental age divided by the 
chronological age. Kuhlman independently, in the spring of 
1912, hit upon the same devise, and published a little later. 
The concept here discussed is now the familiar Intelligence 
Quotient. Terman has adopted the term and investigated the 
concept. As a result of these studies it appears that one's 
intelligence quotient is, at least to quite a marked degree, 
constant through-out eee "The Stanford Revision by 
Terman in 1916 is the best known and is today the standard 
instrument for individual moe 

Since 1915 there has been an important shift in psy- 


chological theory from the functional to the behavioristic 


<= oe oe ee ee Se Ge em oe Se ee Se ee Oe ee Oe ee ee ee ee ee ee ee eee ee Gs es ee oe 


(5) Kelley, Truman Lee, Interpretation of Educational 
Measurements, pp. 5, World Book Co., 1927. 


(6) Monroe, Walter S., DeVoss, James C., and Reagan, George 
W., Educational Psychology, pp. 266, Doubleday Doran 
& CUc., inc., 1950. 
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psychology. The behavioristic or objective psychology is 
not interested in attempting to determine the strength of 
any individual function or capacity, but instead attempts 
to obtain samples of behavior in the hope that from these 
samples may be inferred the child's intelligence. Ragsdale 
says, "We are taking small samples of behavior in the hope 
that by using them we shall be able to estimate the child's 
present behavior status and make predictions concerning the 
kind of behavior which he may be expected to show in the 
Decca 

The present-day plan of measuring intelligence is to 
request the child to give an observable performance and then 
to infer his intelligence from the obtained results. It 
must be emphasized that we do not measure intelligence 
directly. That we cannot measure intelligence with complete 
accuracy may be deduced from the following statement: 
"Strictly speaking, we do not measure intelligence. We 
measure certain achievements, and from the results obtained 
we infer the status of the child's PTR PRP 

The Binet-Simon Scale and the Stanford Revision of it 
are both individual tests and can be given to only one pupil 
at a time. From the standpoint of practical school use, 
there is one great disadvantage in the individual intelli- 


gence test. Expressed in the words of Adams and Taylor: 


(7) Ragsdale, Clarence E., Modern Psychologies and Hduca- 
tion, pp. 222, The MacMillan Co., 1932. 


(8) Monroe, Walter S., DeVoss, James C., and Reagan, George 


W., Educational Pgycholo pp. 265, Doubleday Doran & 
ee tee. 298m = ; : 


"In all of these tests, it was soon recognized that they 
(9) 
were very costly of time." The time element is a very 


important factor if thousands of pupils are to be tested. 

Psychologists were hesitant to accept the results of 
group mental tests at the beginning because they felt that 
these instruments were very inaccurate in comparison with 
the individual tests. The group test was slow in arriving 
and in establishing itself as a legitimate method for the 
measurement of mental ability. Pintner says, "The early 
attitude of psychologists towards group tests was decidedly 
Pataaiee: 

There are several important differences between a 
group test and an individual test. These differences may 


be grouped under the following headings: 


1. Differences in the number of individuals 
measured at the same time, and 


2. Differences in the method of testing. 
It is logical to assume that the individual test would 
yield the more accurate results because the examiner merely 
deals with one individual child. When an individual test 
is being administered in the psychological clinic, the 
examiner is very careful to try to win. the confidence of 


the child. If the child is antagonistic, fatigued, or 


(9) Adams, Jesse E., and Taylor, William S., An Intro- 
duction to Education and the Teaching Process, 
pp. 172, The MacMillan Co., 19352. 


(10) Pintner, Rudolph, Intelligence Testing: Methods anda 
Results, pp. 180, Henry Holt & Co., 1931. 


badly frightened, it is customary for the school psycholo- 


gist to postpone the testing of this pupil until more 
favorable testing conditions are obtainable. Pintner calls 
attention to this difference between the group test and the 

individual test in the following words: "The group test, 
therefore, is not as pure a measure of intelligence as the 
individual test. The group test contains in its score not 
only a measure of the intelligence of the individual, but 
also a measure of his willingness to cooperate and put 
forth his best con ie and Marks say, "The group 
test places the examiner in the attitude of the physician 
who administers an anaesthetic without the precaution of 
keeping a firm hand on the patients pulse: the examiner 
is denied the corroborative evidence of imaginative 

(12) 

insight." 

The gradual evolution of the group intelligence test 
indicates clearly that each new development represented the 
work of some practical psychologist who was faced with a 
problem. As tests for different mental processes were 
multiplied, the group method of testing became popular. 

The transition from the single group test to a series of 


group tests, the results of which could be combined into an 


intelligence rating, was the logical result of further 


(11) Ibid, pp. 183 


(12) Levine, Albert J., and Marks, Louis, Testing 
Intelligence and Achievement, pp. 161, The 
MacMillan Co., 1928. 
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experimentation in group testing. Pintner says, "Thorndike 
Was among the first to see the advantages of this method 
and he must certainly be considered the leader in this 
ee 

It is now accepted educational procedure to interpret 
achievement test results in the light of intelligence test 
scores. The progressive teacher avails himself of the 
group intelligence test in order to find out why a certain 
class should get such low marks in their sudject-matter 
‘tests. If he finds that they score rather low on the group 
intelligence test, it is a pretty good index that his 
instructional methods will have to be varied, and that a 
regular program of diagnostic testing and remedial teaching 
must be faithfully carried out. Buckingham says, "It is 
a curious fact that although many persons felt the in- 
sufficiency of the subject-matter tests, few seemed at first 
to realize that what we most needed in order to make our 
test scores of real worth was intelligence scores to place 
beside them. Until the advent of the group intelligence 
test, this was practically Tank ORepa 

We have previously noted how the group test idea was 
received at the beginning with skepticism, and even actual 


hostility, by many psychologists. Today group tests are 
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(13) Pintner, Rudolph, Intelligence Testing: Methods ana 
' Results, pp. 181, Henry Holt & Co., 1931. 


(14) Buckingham, Burdette Ross, Research for Teachers, 
pp. 140, Silver, Burdette & Co., 1926. 
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being used very widely and have the sanction of all the 


psychologists. In the early days of group testing, an 

event occurred which supplied a tremendous impetus to the 

group test movement. The entrance of the United States 

into the World War necessitated the building of a great 

army. In order to do this, it soon became apparent that 

some system of group mental testing would have to be used 

in order to determine the intelligence levels of the indi- 

vidual soldiers. It is logical that the men possessing a 

high order of intelligence would have the greatest likeli- 

hood of succeeding as officers. On the other hand, the 

problem of what to do with prospective soldiers having 

inferior intelligence must be considered. Later it was 

found that many of these individuals could be assigned to 

labor battalions; many others, however, were discharged 

from the army because of mental defects. The successful 

use by the United States Army of group psychological tests 

hastened the improvement of this type of test instrument. 
Professor Pintner gives the following summary of facts 

about the Army Tests: “The work in the army extended from 

September, 1917, to January, 1919. Psychological testing 

Was established in thirty-five camps and altogether 1,726,966 

men were tested either by means of group or individual tests. 

This total includes 42,000 commissioned officers. Individual 


examinations to the number of 82,500 were given. The 
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psychologists recommended 7,800 for discharge for mental 
defect, or 0.5 per cent of the total examined. They 
recommended 10,014 or 0.6 per cent for labor battalions 
because of low intelligence, and 9,487 or 0.6 per cent for 
assignment to development battalions for training and 
observation for possible use in the eed 

The tests used consisted of two group tests, viz.: the 
Alpha, a group test for literates, and the Beta, a group 
test for illiterates and foreigners. Besides these, indi- 
vidual tests such as the Stanford and the Point Seale and 
Performance tests were used. To Dr. Arthur S. Otis of 
Stanford University goes great credit for the development of 
the Army Tests. 

According to Adams and Taylor, there are now between 
thirty and forty group tests that are rather widely Pt 
It has been shown, now, that the group intelligence test 
has experienced a great development. Should the reader 
assume, then, that the individual intelligence test has out- 
lived its usefulness? The answer is most emphatically, 
"Not" At present the individual-type test is used to 
Supplement the group test. Both types of instruments are 


valuable aids to the school psychologist. Adams and Taylor 


say, "The wide use of the group test does not mean that the 


(15) Pintner, Rudolph, Intelligence Testing: Methods and 
Results, pp. 318, Henry Holt & Co., 1931. 


(16) Adams, Jesse E., and Taylor, William S., An Intro- 
duction to Education and the Teaching Process, The 
MacMillan Co., 1932. 
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individual test has been discarded. The individual tests 
are also widely used, particularly for a more refined 
measurement or as a further test when there is a doubt 
about the results obtained from an individual who has taken 
the group ee 
II. Common Types of Material in Group Tests 
At this time it would be logical to make a brief 
comparison between the materials suitable for the group 
intelligence test and those adaptable for use in achievement 
‘tests. Professor Dearborn makes the following distinction: 
"The school examination requires a special bit of knowledge 
which has usually been recently acquired; the intelligence 
test tests the use of old and fairly common knowledge often 
in a new or somewhat unusual Bia a materials included 
in the group psychological test should involve fairly 
common experiences rather than special learning. [In this 
respect, the mental test resembles the puzzle or riddle, 
since the latter does not call for special learning, but 
rather, ingenuity in the using of ordinary life experiences. 
There are two distinctive features of a good test, viz.: 
1. It utilizes fairly common experiences rather than 
special learning; it calls for ingenuity in the 
attacking of problems yet unsolved by the 
individual. 
2. The test requires a sampling or averaging of the 
individual's abilities. 
(17). SbsAg-pp. 172. 
(18) Dearborn, Walter Fenno, Intelligence Tests: Their 


Significance for School and Society, pp. 55, 
Houghton-Mifflin Co., 1928, 
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There are valid reasons why the group intelligence test 


should contain a large number of items. Wheeler and Perkins 
Summarize these as follows: 


"]. With individuals raised in different environments 
and having different life-interests, a similar 
score in the test will not mean the same for one 
person as for another unless the items are 
sufficiently numerous to give each person an 
equal chance. 


2 There must be a sufficient number of items graded 
in difficulty to differentiate those persons who 
can comprehend only the simpler relationships 
from those who are able to grasp more complex 
relationships; there must be an appropriate 
number of items not too hard for dull individuals 
and not too easy for individuals who are brilliant. 


3. A wide range of facts and relationships must be 
covered to avoid making a test of specialized 
interests and aptitudes. 

4. The items thus varied and graduated must yield 
results that can be expressed in terms of numbers, 
and these numbers must represent the relative 
position of the individual in the group."(19) 

The most common types of material in group tests will 
now be presented: 
a. Opposites - The subject is called upon to write down or 
indicate the opposite of a given word, or to decide whether 
two words denote similar or dissimilar ideas. 
Example - Underline the word in parenthesis which is the 


Opposite of the first word: 


accept.....(receive, percept, deny, reject, spend) 


(19) Wheeler, Raymond Holder, and Perkins, Francis 
Theodore, Principles of Mental Development, pp. 177, 
Thomas Y. Crowell Co., 1932. 
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be. Analogies - An analogy between a pair of facts is given 


and the subject is called upon to draw a similar analogy in 
reference to another pair. 
Example - Underline the best of the four words in parenthesis: 
cellar:attic bottom (well, tub, top, house) 
Ge. Best Reasons - The subject is required to indicate in 
some form or other the best answer to a question. 
Example - Check the best reason: 
Why are criminals locked up? 
1. To protect society 
2 To get even with them 
Se To make them work 
de Disarranged Sentences - A sentence is given in which the 
words are disarranged and the subject has to arrange them 
properly. 
Example - Cross out the superfluous word in the disarranged 
sentences 
watch summer the man stole is jail who the in. 
‘@. Proverbs - The subject has to match proverbs having the 
same meaning, or decide whether they are the same or difier- 
ent in meaning, or match them with statements that are 
identical in meaning. An example of this would consist of 
a number of proverbs to be matched with statemenis that 
explain their meaning. 
f. Number Completion - The subject is required to determine 


the rule or method in a series of numbers and indicate this 
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in some way. 


Example - Write down the two numbers that should come next: 
Bint 6 9 be 18 eee. Sor 
g- Directions - The subject is asked to follow specific 
instructions. 
Example - Cross out the “"g" in tiger. 
h. Sentence Completion =- The subject is to fill in 
omitted words in a sentence or passage. 
Example - Write one word on each blank: 
The boy.....two dollars to the Red Cross. 
i. . Information - The subject is required to use his general 
information over a wide field. 
Example - Underline the correct word: 
Euchre is played with dice, rackets, cards, pins. 


The Delco System is used in plumbing, filing, ignition, 
and cataloguing. 


j- Arithmetical Problems - The subject is required to test 
his ability on reasoning questions in arithmetic. 

ke Word Knowledge - The subject is required to give the 
meaning of single words or words in sentences. 

Example - Underline the word that means the same or nearly 
the same: 


kind - (1) open, (2) fall, (3) good, (4) not far, 
(5) new. 


1. Classification, Generalization - The subject is required 


to classify, generalize, or make a logical selection. There 
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are many tests of this type. 


Example - Draw a line under the two words which tell what 
the thing always has. 


A circle always has: altitude, circumference, lati- 
tude, longitude, radius. 


me. Won-Verbal Material - There is a duplication in non- 
verbal material of almost all of the verbal types of 
materials. For obvious reasons no examples will be given 
here. | 
III. Limitations of Our Present Intelligence Tests: 
Validity and Accuracy. 
A. Validity 

Before a test can be accepted as a "good" test, certain 
requirements as to its validity and reliability must be 
fulfilled. These two topics have been treated in great 
detail elsewhere in this thesis. (pp. 39-72 inc.) It will 
be necessary at this point, however, to supplement what has 
already been given since the latter applies mainly to 
educational tests. Much of what has already been given will 
apply equally well to either subject-matter tests or intell- 
igence tests. 

Monroe, DeVoss, and Reagan say, "Since it is extremely 
unlikely that any one of our present intelligence examina- 
tions is a 'perfect' instrtmeni, peg eee all fail to 


yield valid and accurate measures.” 


(20) Monroe, Walter S., DeVoss, James C., and Reagan, 
George W., Hducational Psychology, pp. 281, Doubleday 
Doran & Co., Inc., 1930. 
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If this statement is true, it is safe to say that there are 
no group mental tests that are one hundred per cent valid 
and reliable. Let us consider at this time the following 
questions: 
1. How nearly do the results yielded from the test 
agree With the results yielded by true measures 


of intelligence? 


2. How accurately does a given test measure What it 
measures? 


The determination of the validity of an intelligence 
test would be simple if we possessed some means of securing 
true measures of intelligence. Because such true measures 
are not available, test-makers have used a variety of 
approximations. A careful application of the Stanford 
Revision of the Binet Test is considered one of the most 
widely used criterion measures. 

Because the validity of a test must be determined by 
means of some outside criterion of intelligence, we shall 
now consider some of the common criteria employed. 

1. Chronological Age 

This criteria was employed by Binet in his early work. 
According to this criterion, a test of intelligence should 
be passed by increasing percentages of children as we go 
from the lower to the higher grades. Pintner says, "This 
criterion is of limited value and is not commonly used by 


psychologists today, although it is useful in helping to 
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determine the relative discriminating values of a series 
(21 


of tests.” 

2e Known Groups 

This criterion was also used by Binet.in his early 
work. If we have two groups of known intelligence as for 
example, a feebleminded group and a normal one, it is 
Obvious that a test administered to them should result in 
much higher scores for the better group. If there is little 
relative difference between the results obtained, then the 
‘test cannot be a good measure of intelligence. In the same 
Way, the scores on a group intelligence test for a superior 
class could be compared with those from a normal group. 
Pintner says, "As a first rough measure of the goodness or 
badness of a test for intelligence testing purposes, it has 
proved od 

3. Teachers’ Judgments 

The judgment of teachers as to the intelligence of their 
pupils is frequently used as a criterion of the validity of 
atest. The theory behind this is that the teachers are in 
a most favorable position to judge the relative levels of 
intelligence among their pupils and consequently, the 
results from the intelligence test ought to correlate some- 
What with their judgment. The results from this criterion 
must not be accepted too readily. The judgments of all 
human beings are fallible. Pintner holds that: 


(21) Pintner, Rudolph, Intelligence Testing: Methods and 
Results, pp. 105, Henry Holt & Co., 1931. 


(22) Ibsd, pp. 106. 
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"If we are constructing a new test and find that it correlated 


about .3 and .6 on the average with teachers’ judgments, we 
should be satisfied so far as this criterion of validity is 
bach susa ince 

4. School Achievement 

The use of school achievement in validating a group 
psychological test may be done by using any one of three 
things, viz., school marks allotted by teachers, scores on 
standard educational tests, or by the rate of progress through 
the grades. Whenever we take any of these ratings, we assume 
that the intelligent child will work more or less up to his 
Gapacity in school work. In regard to rate of progress, we 
assume further that he will be allowed to progress through 
the grades at a rate commensurate with his intelligence. 
Pintner says, “The use of educational tests as measures of 
validity for intelligence tests is, therefore, of limited 
value. An intelligence test should correlate fairly well 
with educational achievement, but we cannot use an education- 
al achievement test as the sole te EO oh 

5. Other Tests 

Another-validation method is to correlate with a 
known test of intelligence. If the Stanford-Binet is 
accepted as a valid intelligence test, it is evident that 


the results obtained from it ought to correlate positively 


(23) Ibid, pp. 107. 
(24) Ibid, pp. 110. 
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with the results obtained from the test that is being 


validated. The important point to stress in connection with 
this criterion is that the test accepted as a valid measure 
of intelligence must have been adequately validated other- 
wise the obtained results will be valueless. 
6. Combinations of These 

There is no single validation method explained so far 
whose use will result in a complete validation. All suffer 
from some drawback, hence it was hoped that a combination 
of them would enable the test experts to arrive at a better 
criterion. Pintner reports that McCall and Lin each used a 
composite criterion with no little Sat ae used a 
criterion made up of teachers' judgments, Binet, group 
intelligence tests and measures of educational achievement. 
Liu used the most elaborate criterion for estimating intell- 
igence. His criterion consisted of (1) age, (2) school 
marks, (3) school progress, (4) teachers' estimates of 
intelligence, and (5) composite test scores of five group 
intelligence scales. The different elements in the criterion 
were carefully weighed. The evidence indicates that Liu has 
obtained very good results in the use of his method. Pintner 
concludes that: “From this survey of the various methods 
of determining the validity of an intelligence test, we can 


see that there is no one method that is infallible. Hach 


(aa), hid, pp. 112. 
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single method is open to objections. The psychologist must 
use as many of these methods as he can. The construction 
of a composite criterion is undoubtedly the safest ae 
B. Reliability 

It is not only important that the group psychological 
test be valid but it must also be reliable. The test must 
measure not only what it is intended to measure but it 
must also measure it accurately. As we have seen in our 
previous studies (pages 56-72 inclusive of this thesis), the 
usual measure of reliability is the coefficient of correla- 
tion between the two forms of a test. In view of the fact 
that the subject of reliability has been treated quite fully 
previously in this thesis, the present writer will not 
include any new material. It is sufficient to say that the 
methods of insuring reliability in a new-type educational 
test are identical to those used in reference to the group 
psychological examination. 

IV. Practical Values of Intelligence Tests 
A. iHducational Guidance 

The group jabeweietbad examination is a valuable 
source of help to the practical teacher in many situations. 
It is especially valuable in reference to guidance work. 
It happens quite frequently in large high schools that 


certain pupils are not scheduied correctly, and consequently, 


(26) Ibid, pp. 112. 
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are put into a class of superior pupils. Many times these 
unadjusted pupils will become discipline problems because 
they have no other outlet for their energies. The accomp- 
lishment of their classmates is so superior that these 
“problem cases" are left hopelessly in the rear. Buckingham 
makes the following comment: "To place a person of low 
intelligence, whether that person is a school child or an 
adult worker, in a position which requires a higher degree 
of intelligence is to rob him of the satisfaction of success 
and to engender the habit of failure. On the other hand, 

to place a person of high intelligence in a position which 
Galls-for lower inteiligence tends to weaken the fiber of 
his moral a ean at obvious solution to these problems 
of maladjustment is more effective educational guidance. 

If unusual cases arise, the school psychologist's aid should 
be sought. 

Now, this problem of effective educational guidance is 
not so simple as it may seem. In many cases, it is diffi- 
cult or almost impossible for the class-room teacher to 
judge correctly the intelligence of a certain pupil. It 
is readily seen that the teacher's impressions will not 
insure a correct analysis as to the pupil's mental level. 

In many cases, the teacher's impressions are biased by the 


appearance of the pupil. Good clothes and external 
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(27) Buckingham, Burdette Ross, Research for Teachers, 
pp. 162, Silver, Burdett and Co., 1926. 
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appearances of health are often deciding factors in the 
teacher's estimate. Many times the teacher is deluded 
because of a sprightly attitude on the part of the pupil. 
Adams and Taylor say, "For the most part mental measure- 
ments have been of great value in helping us to detect 
more accurately the level of intelligence of individuals. 
They furnish a standard to grade by, which in itself is of 
material er “a 
B. Sectioning of Classes 
Another important use made of the group mental test 
is in the sectioning of classes. Many high schools and 
colleges section oh aie Glasses according to ability with 
the intent of varying the instruction offered in order to 
mest the intellectual level of the particular classes. 
As this topic has been treated in detail elsewhere in this 
study ( pages, 168-175 inclusive), it is not necessary to 
take up any additional material. 
C. Surveys 
The group psychological examination has been very 
important in the making of educational surveys. ‘This makes 
it possible to make comparisons between communities. 
D. Vocational Guidance Values 
Mental tests have value for predictive purposes in both 


the vocational and the educational fields. They are of 
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(28) Adams, Jesse E., and Taylor, William S., An Intro- 
duction to Education and the Teaching Process, 
pp. 176, The MacMillan Co., 1952. 
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great assistance to the employer in selecting employees, 


and also to the individual, since the latter uses them to 
enable him to find out the vocation for which he is best 
suited. ‘The results from mental tests are invaluable to 
the vocational counselor in enabling him to help advise 
pupils as to what type of course to take in high school. 
It follows logically that a pupil of low intelligence who 
elects the technical course which prepares for the college 
of engineering should be advised to change his course 
because of the limited possibilities of his succeeding. 
Ragsdale holds that: “By using intelligence test scores 
in connection with other information which has been obtain- 
ed about a pupil, it has been found possible to predict 
fairly well the course of his future scholastic dicted te 
Wheeler and Perkins say, “Tests are useful in high school 
in giving vocational advice, in helping the student select 
his course, and in determining the rate at which the student 
should attempt to yimerees weight of authority tends 
toward the conclusion that the psychologists by their 
development of the group peccbayociea? examination have 
performed a signal service to the schools of America. 
Ve. Group Intelligence Tests Suitable for Use in Secondary 
Schools 
At the beginning of this section the present writer 


(29) Ragsdale, Clarence E., Modern Psychologies and 
Education, pp. 227, The Macliillan Co., 1932. 


(30) Wheeler, Raymond Holder, and Perkins, Francis Theodore, 
Principles of Mental Development, pp. 190, Thomas Y. 
Crowell Co., 1932. 
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wishes to call attention to the impossibility of describing 
and explaining all of the group psychological tests that 
have been published. It is undoubtedly true that some group 
tests of considerable value are not very well known because 
no effort has been made to publish them. Many school 
systems and colieges have developed their own psychological 
examinations, yet in spite of the fact that these examina- 
tions possess great value, they are merely used in the 
particular school system or institution where es were 
developed. The method that will be employed in this section 
will be to give a short description of some of the most 
commonly used and readily available group tests. 

Professors Douglass and Boardman in their latest book 
eedommena the following group psychological tests for high 
school use: l, Haggerty Intelligence itxamination, Delta 2, 
2, Miller Mental Ability Test, 5. Otis Self-Adminstering 
Test of Mental Ability, Higher Examination, 4. Pressey 
Cross-Out Test, and 5. Terman Group Test of Mental ee 
It is readily apparent that some of these tests have received 
Wider publicity than others. The present writer proposes to 
discuss these tests briefly and then to supplement this by 
a short consideration of other group intelligence tests 


that are well known. 
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(31) Douglass, Harl L., and Boardman, Charles W., 
Supervision in Secondary Schools, pp. 5535, 
Houghton-Mifflin Co., 1934. 
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Haggerty Intelligence Examination, Delta 2. 

This test consists of six exercises as follows: 
1. Discrimination between true and false state- 
ments; 2. Arithmetical problems; 3. Picture 
Gompletion; 4. Discrimination between words, 
whether same or opposite; 5. Common sense 
judgments; and 6. General information. Published 
by World Book Co., Yonkers, New York. This test 
is suitable for grades 3-9. No reliability 
coefficient is reported, although the author re- 
ports that Stenquist found an (r) of .81 on this 
test after testing five hundred children from 
grades 4-8 inclusive. The estimated reliability 
for a single grade is .6. The test comes in one 
form. Time required, 20 minutes. Author, M. E. 
Haggerty. 


Miller Mental Ability Test 

Consists of three tests: ll. Disarranged sentences 
combined with directions; 2. Controlled associa- 
tion; 3. Analogies. Suitable for grades 7-12. 

The test comes in two forms. Published by the 
World Book Company. Time required is 20 minutes. 
The reliability coefficient for re-testing 109 
pupils in Grade 10 is .91. The standard deviation 
is reported as 14.3. Author, W. 5. Miller. 


Otis Self-Administering Test of Mental Ability 
This examination is arranged for two levels, viz., 
the Intermediate Examination for Grades IV to IX 
and the Advanced Hxamination for High Schools 

and Colleges. It is a very easy test to administer 
as the subject merely reads over directions on the 
first page of the test, and these directions give 
samples of all the different kinds of items which 
appear in the test proper. The examination for 
each level is furnished in two forms - Form A 

and Form Be. Neither examination is divided into 
Sub-tests, but different types of items appear 
mixed up throughout the test, beginning with easy 
items and proceeding to more difficult ones. 

There are 75 items in each examination. The re- 
liability for the Intermediate Examination for 
Grades IV to IX is .95 and for the Advanced 
Examination for Grades VII to XII is .92. The 
reliability coefficient for the Advanced Examina- 
tion is based on a sample group of 2535 pupils 

from grades 7 to 12. The standard deviation is 
reported as 13.82. Author, A. S. Otis. 


avs. 


wilt neo tgs se ta 


' « ¥ 
: Pees yeh 0.3 i #9; -§ At 
ooW Voswhed neAtee clati swale veo) 
Lf .%& pasdsicowm iseltemigess: a ¢ednan 
® noOONsed . -TeositL «2 “tnebse 


ae 

+e € 20 Were F) . 
a. ae 

: colo Se. Sees me 


g 
r » . 
« : a . bie ri 
- auf d, ~e 
s - ee a “ © 4 
. - a: 
th Se 5 » DLOESS 
4 * 
& 6 “ 
3 g ; ) aoa CA 
> 
. ~ «A 
? . ~~  & 
‘ a < s Fev 48 
5 = 
* ‘ a o “— 
. . 7 
> r. poe ee | 
. . - . —_—h & 
get ero- 
= . 
Pe 
° 
| ° 
a’ 
E * 
- i a 
‘ ‘ 
: ‘ ' . 
. = 
e * 
UJ ; 
Se ae & 
» 
: > 
< > © 
ig: 
7 not L. 
. 3 
~ * 
e 
; 
4 
: 4 
e 7 
e 
F = 
‘ “ . 
- « ae" 
y 2 ra 7, 
. eS = 7 ? 1 
7 vie ie a ~* iLidatt 
z : J c+ . ~- 
‘ ? = ~ * 
y 3 So: be t 6s: zt. 


2; Y Ti pi rod OL Onan 

2205 . ie Be ‘etait toaes ~iLitdat ier ~ 
A ' a , 5 { ties ’ ak: need ws 

. oe ShaTE hy ae 


4. 


Pressey Cross-Out Test ; 

There are four exercises in this test, each call- 
ing for the same type of response, viz., crossing 
out something. The test is useful from Grade III 
to High School. In test one, the subject is 
Galled upon to cross out the superfluous word in 
disarranged sentences; in test two, the super- 
fluous word in lists of words related to each 
other; in test three, the superfluous number in 
@ number series; and in test four, a moral judg- 
ment test in which the worst thing in the list 

is to be crossed out. There are excellent norms 
for these tests for ages 10 to 17, and for Grades 
III to XII. No information relative to validity 
and reliability is available. 


Terman Group Test of Nental Ability 

This test is one of the most frequently used 
tests for high school purposes. It is suitable 
for grades 7 to 12 inclusive. Age and grade 
norms are available. The examination consists 
of ten parts, viz: 1. Information; 2. Best 
answer; 3. Word meaning; 4. Logical selection; 
5. Arithmetical problems; 6. Sentence meaning; 
7. Analogies; 8. Mixed sentences; 9. Classifica- 
tion; 10. Number series. The reliability for 
132 cases in Grade IX is .89. This test comes 
in two forms. Published by World Book Co. The 
standard deviation is reported as 24.2. ‘Time, 
27 minutes. Author, L. Mi. Terman. 


Dearborn Intelligence Scale, Series II. 

Adapted to Grades IV to XII. It consists of two 
examinations containing the following tests: 

1. Picture sequences; 2. Word sequences; 3. Form 
completion; 4. Opposite completion; 5. Faulty 
pictures; 6. Disarranged proverbs; 7. Number 
problems. Norms for ages 6 to 20, and for Grades 
II to XII are given. Published by Lippincott Co. 
Author, Walter F. Dearborn. No information is 
available as regards validity and reliability. 


The Otis Group Intelligence Test . 

This group test applies to Grades V to XII. It 
is divided into ten parts, viz.: 1. Following 
printed directions; 2. Opposites; 3. Disarranged 
sentences; 4. Matching proverbs; 5. Arithmetic; 
6. Geometric figures; 7. Analogies; 8. Similari- 
ties; 9. Narrative completion; and 10. Memory. 
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The reliability coefficient for Grades IV to 
VIII is given as .967. Author, A. S. Otis. 


Detroit Advanced Intelligence Test 

This test is designed for high school and college 
use. It consists of the following parts: l. 
Information; 2. Opposites; 3. Classification; 

4. Number sequence; 5. Block designs; 6. Spelling; 
7. Analogies; 8. Mixed-Up sentences. No infor- 
mation is available in regard to validity and 
reliability. Norms are given for ages 9 to 25 and 
letter ratings for ages il to 16. 


Thurstone Psychological Examination 

This test can be used effectively for all years 
in high school or college. It contains a large 
number of problems involving analogies, number 
completion, logical reasoning, mental arithmetic, 
general information, sentence completion, 

proverb matching and the like. The items are 
arranged in a spiral arrangement, and the 
different types are thoroughly mixed up. The 
Same type of problem occurs again and again, 
beginning with the easiest examples and gradually 
becoming harder and harder. The reliability 
coefficient is reported as .959, and was ob- 
tained by working with 250 subjects. The 
reliabilities on the separate parts vary from 

71 to 298. 
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Le CONCLUSIONS 


The necessity for a well-rounded testing program has 
been frequently emphasized in this thesis. If testing in 
the social-business studies is to yield rich dividends in 
the form of improved teaching efficiency, it will be through 
the medium of a planned and well-organized testing program. 
The busy teacher with large classes is too prone to give 
tests whenever it suits his personal convenience; this may 
Or may not be the time when a test should be given if a 
definite test program is followed. Critics of the test 
program can no longer clamor that the giving and scoring 
of tests consumes too much time. Such arguments lose their 
validity when it is proved that the achievement of large 
and small classes may be determined equally as well by the 
use of the new-type test techniques. 

There are many other desirable ideas that should be 
advanced in connection with the well-rounded testing 
program. It is safe to conclude that a testing program 
is rarely worth the time, effort, and money expended if it 
does not result in some worth-while modification in class- 
room procedures and practices, materials and methods of 
instruction, class and school organization and management, 
Or some other phase of school work. A testing program 


that does not measure up to the accepted standards should 
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be immediately challenged. In such a situation, the 
Commercial Department Head should present the problem to 
his teachers and enlist their aid and advise in effecting 
the necessary revision. Such a step is based on sound 
psychology as it attains two worth-while ends: the teachers 
will support more readily a test program that they, them- 
selves, have formulated, and, secondly, the new test 
program will be more effective after the elimination of 
Obvious weaknesses. 

The following specific conclusions are maintained in 
this thesis: 
1. AS commercial teachers experience difficulty with the 
testing problem in the social-business subjects, this field 
offers abundant opportunity for research work. The field 
has not been covered adequately even as yet, although great 
strides have been taken. 
2 The test concept is very ancient. Tests and examinations 
of various kinds were in use hundreds and even thousands of 
years ago among such people as the Chinese, the Greeks, and 
the Romans. 
5. A testing program should consist of both standardized and 
non-standardized tests. In most cases, the number of the 
latter should exceed that of the former, their respective 


proportions depending partly on how satisfactory is the supply 


of standard tants available in the aa aah being dealt with. 
4. In its present state of development the new-type test 
is best suited to test the acquisition of information. 

5. It is in the use of tests that the greatest hope for 
scientific guidance lies. 

6s Although educational measurements are of great value to 
the supervisor and administrator, their most valuable 
contributions have deen made in the improvement of teaching. 
7. The testing program must provide for comparable tests. 
‘Test results should be marked on cumulative records made 
available for reference purposes to the entire staff. 

8. Standards, as well as norms, should be provided for 
tests. 

9. The testing program must not be limited to one standard 
test administered late in the year. To do this defeats the 
Objectives of the entire testing program. 

10. Provision must be made for continuous systematic test- 
ing of pupils at regular intervals during the year. The 
further instruction of the pupils should be adapted to 
their needs as shown by a study of the test results. 

ll. The giving of tests in itself is of little value, but 
Should be followed by diagnosis and remedial measures. 

12. Supervision of testing is as important as supervision 


of teaching. It devolves upon the Commercial Department Head 
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to examine the tests his teachers are using, and to assist 


them in selecting or planning suitable ones. 

13. It is desirable that the specifications of a marking 
system should grow out of a cooperative study of the problem 
by all the teachers concerned rather than that they be 
handed down by the executives in the system. 

14. School marks should be based solely on the result of 
tests as far as possible. 

15. From the evidence, it appears that relative marking 
systems are more justifiable than the absolute types. 

16. The marking system adopted in a school must be adhered 
to by all members of the faculty. A teacher who constantly 
deviates from the accepted rules is being unfair to the 
entire staff. 

17. The problem of individual differences can be met at 
least partially by ability grouping. 

18. Pupil classification should be neither haphazard nor 
arbitrary. 

19. There is no royal road to homogeneous sectioning. All 
the evidence that is obtainable must be considered carefully; 
in addition, provision must be made for the constant shifting 
Sede One section to another whenever instances of maladjust- 
Ment occur. | 


20. Every good-sized secondary school should, if possible, 
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have a testing bureau. This bureau should be a place 


where the teacher can go for desired assistance in the 
construction of the various types of modern tests. 

The principles underlying a well-rounded testing 
program have now been explained. ‘The present writer feels 
Gonfident that a testing program such as he advocates 
would go far toward odtaining the desired results. Whether 
progress is made depends to a large extent upon the 
individual teacher. There are just two ways to meet the 
Situation. One is a policy of inaction and stagnation; the 
other, a policy of action. The hope of the teaching 


profession lies in the latter. 
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